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1 To the best knowledge of the author of these notes, the fair use clause of the U.S. copyright law (see, 
e.g. http://www.copvright.gov/fls/fll02.html ) permits using published materials for self-study and 
classroom instruction, i.e. the main objectives of this distribution, without requesting copyright holder's 
permission. 
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This is a preliminary ("beta") version of a series of lecture notes and problems on "Essential 
Graduate Physics", consisting of the following four parts: 

CM: Classical Mechanics (for a 1 -semester course), 
EM: Classical Electrodynamics (2 semesters), 
QM: Quantum Mechanics (2 semesters), and 
SM: Statistical Mechanics (1 semester). 

The parts share a teaching style, structure, and (with a few exceptions) notation, and are interlinked by 
extensive cross-referencing. I believe that due to this unity, the notes may be used for teaching these 
courses not only in the (preferred) sequence shown above, but in almost any order - or in parallel. Each 
part is a two-component package consisting of: 

(i) Lecture Notes chapter texts, 2 with a list of exercise problems in the end of each chapter, and 

(ii) Exercise and Test Problems with Model Solutions file (in two formats, see below). 

The series also includes two brief reference appendices, MA: Selected Mathematical Formulas (16 pp.) 
and CA: Selected Physics Constants (2 pp). 

The series is a by-product of the so-called core physics courses I taught at Stony Brook 
University from 1991 to 2013. Reportedly, most physics departments require their graduate students to 
either take a set of similar courses or pass comprehensive exams based on approximately similar body of 
knowledge (or both :-). This is why I hope that my notes may be useful for both instructors and students 
of such courses, as well as for individual learners. 

The motivation for composing the lecture notes (which had to be typeset because of my horrible 
handwriting) for Stony Brook students was my desperation to find textbooks I could actually use for 
teaching. First of all, the textbooks I could find, including the brilliant Theoretical Physics series by 
Landau and Lifshitz, did not match my classes, which included experiment-oriented students, PhD 
candidates from other departments, US college graduates with poor background education in physics, 
and some advanced undergraduates. Second, for the rigid time restrictions imposed on the core physics 
courses, most available textbooks are way too long, and using them would mean hopping from one topic 
to another, picking up a chapter here and a section there, at a high risk of losing the necessary 
background material and logical connections between course components - and students' interest with 



2 The texts are saved as separate .pdf files of each chapter, optimized for two-page viewing and double-side 
printing; merged files for each part and the series as a whole, convenient for search purposes, are also provided. 
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them. On the other hand, many of the textbooks lack even brief discussions of several traditional and 
modern topics that I believe are necessary parts of every professional physicist's education. 3 

The main goal of my courses was to make students familiar with the basic notions and ideas of 
physics (hence the series' title), and my main effort was to organize the material in a logical sequence 
the students could readily follow and enjoy, at each new step understanding why exactly they need to 
swallow the next knowledge pill. As a back side of such a minimalistic goal, I believe that my texts may 
be used by advanced undergraduate physics students as well. Moreover, I hope that selected parts of the 
series may be useful for graduate students of other disciplines, including astronomy, chemistry, 
mechanical engineering, electrical, computer and electronic engineering, and material science. 

At least since Sophocles, i.e. for the last 2,500 years, teachers have known that students can 
master a new concept or method only if they have seen its application to at least a few particular 
problems. This is why in my notes, the list of theoretical physics methods is limited to the approaches 
that are indeed necessary for solution of the problems I had time to discuss, and the introduction of 
every new technique is accompanied by an application example or two. Additional exercise problems 
are listed in the end of each chapter of the lecture notes, and may be used for homeworks. Individual 
readers are strongly encouraged to solve as many of these problems as possible. 

Detailed model solutions of the exercise problems (some with additional expansion of the lecture 
material), and several shorter problems suitable for tests (also with model solutions), are gathered in 
separate files - one per each part of the series. These files are available for both university instructors 
and individual readers - free of charge, but in return for a signed commitment to avoid unlimited 
distribution of the solutions, including their posting on externally searchable Web sites. For instructors' 
convenience, these files are available not only in the Adobe Systems' Portable Document Format 
(*.pdf), but also in the Microsoft Office 1997-2003 format (*.doc) free of macros, so that the problem 
assignments and solutions may be readily grouped, edited, etc., before their distribution to students, 
using either almost any version of Microsoft Word or independent software tools - e.g., the public- 
domain OpenOffice.org. I hope that these materials would, at least partly, save teaching faculty the time 
for generating neat and double-checked problem solutions for their students. My plans are to extend the 
exercise and test problem sets substantially with time. 

I know well that my texts are far from perfection. In particular, some sacrifices made at the topic 
selection, always very subjective, were extremely painful. (Most regretfully, I could not find time for 
even an introduction to general relativity. 4 ) Moreover, it is very probable that despite all my effort and 
help from SBU students and TAs, not all typos/errors have been weeded out. This is why all remarks 
(however candid) and suggestions by the readers would be highly appreciated. All significant 
contributions will be gratefully acknowledged - both on this Web site and in future versions of these 
materials. 



3 To list just a few: statics and dynamics of elastic and fluid continua, basic notions of physical kinetics, 
turbulence and deterministic chaos, physics of reversible and quantum computation, energy relaxation and 
dephasing in open quantum systems, the Rotating- Wave Approximation (RWA) in classical and quantum 
mechanics, physics of electrons and holes in semiconductors, weak-potential and tight-binding approximations of 
the energy band theory, optical fiber electrodynamics, macroscopic quantum effects in Bose-Einstein condensates, 
Bloch oscillations and Landau-Zener tunneling, cavity QED, and the Density Functional Theory (DFT). All these 
topics are discussed, if only concisely, in these notes. 

4 For an introduction to the subject, I can recommend either S. Carroll, Spacetime and Geometry, Addison- 
Wesley, 2003, or a longer text by A. Zee, Einstein Gravity in a Nutshell, Princeton U. Press, 2013. 
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Disclaimer 

These open-access materials are available for everybody free of charge, so that their author can 
hardly be blamed for deceiving "customers" for his own commercial gain. Still, I would like to go a 
little bit beyond the usual litigation-avoiding claims, 5 and offer a word of caution to the potential reader, 
in order to preempt his or her possible later disappointment. So, what these lecture notes are, and what 
they are not - as perceived by their author. 

This is NOT a course of theoretical physics - at least in the contemporary sense of the term 

Despite the fact that that much of the included material has been derived from some best 
textbooks on "theoretical physics" (most notably from the famous series by L. Landau and E. Lifshitz), 
this lecture note series is different from them by its emphasis on the basic concepts and ideas of physics, 
their relation to experimental data, and most important applications - rather than on sophisticated 
theoretical techniques. Indeed, the set of theoretical methods discussed in the notes is limited to the 
minimum necessary for quantitative understanding of key notions of physics and solving a few (or rather 
a few hundred :-) core problems. Moreover, because of the notes' shortness, I has not been able to cover 
some key fields of physics, most notably the general relativity and quantum field theory - beyond some 
introductory elements of quantum electrodynamics in QM Chapter 9. If you want to work in modern 
theoretical physics, you need to know much more than these lectures! 

Moreover, this is NOT a textbook 

A usual textbook tries (though most commonly fails) to cover virtually all aspects of the field. 
As a result, it is typically way too long for being fully read and understood by students during the time 
allocated for the corresponding course, so that the instructors are forced to pick up selected chapters and 
sections, frequently loosing narrative's logic lines - and students' interest with them. In contrast, these 
notes are much shorter (about 200 pages per semester), enabling their thorough reading - perhaps with 
just a few sections dropped, depending on reader's interests. I have tried to mitigate the losses due to 
this minimalistic approach by providing extensive further reading recommendations on the topics I had 
no time to cover. The reader is highly encouraged to use these sources (and/or the corresponding 
chapters of more detailed textbooks) on any topic(s) of his or her special interest. 

Then, what these notes ARE and why you may like to use them (I think) 

By tradition, graduate physics education consists of two major components of comparable 
importance: research experience and advanced physics education. Unfortunately, the latter component is 
currently under either open or clandestine attacks in many physics departments, apparently because of 
two reasons. On one hand, the average knowledge level of students entering graduate school is falling, 
so that bringing them up to the level of contemporary research becomes increasingly difficult. On the 
other hand, the research itself becomes more fragmented, so that the students frequently do not feel an 
immediate need for a broad physics knowledge base for their PhD project success. Some thesis advisors, 
trying to maximize the time they could use students as a cheap laboratory workforce, do not help. 



5 Yes Virginia, these notes represent only my personal opinions, not those of our Department of Physics and 
Astronomy of Stony Brook University, the University at large, the SUNY system, the Empire State of New York, 
the federal agencies and private companies that funded my group's research, etc. No dear, I cannot be responsible 
for any harm, either bodily or mental, their reading may (?) cause. 
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I believe that this trend toward the reduction of a broad physics education in graduate school is 
irresponsible. Experience shows that during his or her future research career, a typical current student 
will change research fields several times. Starting from scratch in a new field is hard - terribly hard in 
advanced age. However, physics is fortunate to have a hard core of knowledge, that many other sciences 
lack. With this knowledge, the student will always feel in physics at home, while without it, he or she 
would not be able even to understand research literature in the new field, and would risk being reduced 
to auxiliary work roles - if any. 

I have seen the main objective of my Stony Brook courses to give an introduction to this hard 
core of physics knowledge, while trying to convey my own excitement by the unparalleled beauty of the 
concepts and ideas of this science, and the remarkable logic of their fusion into a wonderful single 
construct. Let me hope that these notes relay at least a part of this excitement. 

Versions, Corrections, and Acknowledgments 

This is a preliminary ("beta") version of the lecture note series. My plans are to publish, in a 
couple of years, the first "real" Version 1.0. In that version, the sections that I am, by that time, most 
unhappy with, will be refurbished. 

Until the release of Version 1.0, the beta will be kept virtually stable. This means, in particular, 
that I will avoid any changes in numbering of chapters, sections, formulas, and figures - though not of 
problems and footnotes. The only continuously introduced changes will be: 

- corrections of the typos (and perhaps some genuine errors) noticed by readers and myself, and 

- addition of more problems - this is why some problem numbers may be changed. 

I would deeply appreciate sending correction suggestions, and any other comments/remarks - 
however candid - to the email address indicated below. All changes will be listed in a time-ordered log 
file (that will also include appropriate grateful acknowledgments), to be launched on the series' Web site 

http://mysbfiles.stonybrook.edu/~klikharev/EGP/ 

(and its possible future mirrors) in September 2014. 

I am sorry I have not kept such a correction log from the beginning of my lectures at Stony 
Brook University, so I cannot list all the numerous students and TAs who had kindly attracted my 
attention to typos in preliminary versions of these notes. 

I am grateful to several faculty colleagues for their valuable remarks concerning certain sections 
of the notes: A. Abanov (QM Sec. 3.6), P. Allen (Preface, QM Sec. 8.4, and SM Chapter 5), D. Averin 
(QM Sees. 7.6 and 8.5), M. Fernandez-Serra (QM Sec. 8.4), and A. Korotkov (QM Sees. 7.4-7.6 and 
8.5). Evidently, they should not be held responsible for the remaining deficiencies. 

The Department of Physics and Astronomy of the SBU was very responsive to my requests of 
certain time ordering of my teaching assignments, that was beneficial for note editing. 

And last but not least, I would like to thank my wife Mila for several good advices on aesthetic 
aspects of note typesetting, and for all her love and patience - without them, this project would be 
impossible. 

Konstantin. Likharev @ stonybrook. edu 
December 2013 
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Problem Solution File Request Templates 

Requests should be sent to konstantin.likharev @ stonybrook.edu in either of the following forms: 

- an e-mail from a university address, 

- a scanned copy of a signed letter - as an e-mail attachment. 
Approximate contents: 

A. Request from a Prospective Instructor 

Dear Dr. Likharev, 

My plans are to use your lecture notes and problems of the Essential Graduate Physics series, 
part <select: CM, EM, QM, SM>, in my course <title> during <semester, year> in <department, 
university>. I would appreciate sending me file Exercise and Test Problems with Model Solutions of that 
part of the series in the <select: .doc, .pdf, both .doc and .pdf> format(s). 

I will avoid unlimited distribution of the solutions, in particular their posting on externally 
searchable Web sites. If I distribute the solutions among my students, I will ask them to adhere to the 
same restraint. 

Sincerely, <signature, full name, university position, work phone number> 

B. Request from an Individual Learner 

Dear Dr. Likharev, 

My plans are to use your lecture notes and problems of the Essential Graduate Physics series, 
part <select: CM, EM, QM, SM>, for my personal education. I would appreciate sending me file 
Exercise and Test Problems with Model Solutions of that part of the series. 

I will not share the material with anyone, and will not use it for passing official courses based on 
your series. 

Sincerely, <signature, full name, present home address, acting phone number> 
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Abbreviations 

Eq. any formula (e.g., equation) 
Fig. figure 
Sec. section 



Notation 

Fonts 

A scalar variable 
A vector variable 
A scalar operator 
A vector operator 
A matrix 



Ajj> matrix element 



Symbols 

time differentiation operator (dldt) 
V spatial differentiation vector (del) 
« approximately equal to 
~ of the same order as 
oc proportional to 
= equal to, by definition 
• scalar ("dot-") product 
x vector ("cross-") product 

time averaging 

< ) statistical averaging 

[ , ] commutator 
{ , } anticommutator 



Parts of the series 

CM: Classical Mechanics 

EM: Classical Electrodynamics 
QM: Quantum Mechanics 
SM: Statistical Mechanics 

Appendices 

MA: Selected Mathematical Formulas 
CA: Selected Physical Constants 

Formulas 

The most general and/or important formulas are highlighted with blue frames and short titles on 
the margins. 



Numbering 

Chapter numbers are dropped in references to formulas, figures, footnotes, and problems within 
the same chapter. 
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Chapter 1. Review of Fundamentals 



After elaborating a bit on the title and contents of the course, this short introductory chapter lists the 
basic notions and facts of the classical mechanics, that are supposed to be known to the reader from 
undergraduate studies. 1 Due to this reason, the explanations {if any) are very brief 



A more fair title of this course would be Classical Mechanics and Dynamics, because the notions 
of mechanics and dynamics, though much intertwined, are still somewhat different. Mechanics deals 
with deriving the equations of motion (most frequently - ordinary or partial differential equations) of 
point-like particles and their systems (including solids and fluids), solution of these equations, and 
interpretation of the results. Dynamics is a more ambiguous term; it may mean, in particular: 

(i) the part of mechanics that deals with motion (in contrast to statics); 

(ii) the part of mechanics that deals with reasons for motion (in contrast to kinematics); 

(iii) the part of mechanics that focuses on its two last tasks, i.e. the solution of the equations of 
motion and discussion of the results. 

The last definition invites (at least) two questions. First, it may look like mechanics and 
dynamics are just two sequential steps of a single process; why should they be considered separate 
disciplines? The main reason is that the many differential equations of motion, obtained in classical 
mechanics, also describe processes in different physical (and sometimes not only physical!) systems, so 
that their analysis may reveal important features of these systems as well. For example, the famous 
ordinary differential equation 



describes sinusoidal ID oscillations not only of a mass on a spring, but also of an electric or magnetic 
field in a resonator, and many other systems. Similarly, the well-known partial differential equation 



mechanical continuum (solid or fluid), but also electromagnetic waves in a non-dispersive media, 
certain chemical reactions, etc. Thus the results of analysis of the dynamics described by these equations 
may be "recycled" for applications well beyond mechanics. 



1 For remedial reading I could recommend, for example (in the alphabetical order): G. R. Fowles and G. L. 
Cassiday, Analytical Mechanics, 7 th ed., Brooks Cole, 2004; K. R. Symon, Mechanics, 3 rd ed., Addison-Wesley, 
1971; J. B. Marion and S. T. Thornton, Classical Dynamics of Particles and Systems, 4 th ed., Saunders, 1995. 

2 These notes assume reader's familiarity with basic calculus and vector algebra. The most important formulas are 
listed in the Selected Mathematical Formulas appendix, referred below as MA. In particular, a reminder of the 
definition and basic properties of the Laplace operator may be found in MA Sec. 9. 



1.1. Mechanics and dynamics 



x + col x = 0 



(1.1) 
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The second natural question is: Definition (iii) of dynamics is suspiciously close to the part of 
mathematics devoted to the differential equation analysis; what is the difference? To answer it, we have 
to dip, for just a second, into the philosophy of physics. Physics may be described as an art (and a bit of 
science :-) of description of Mother Nature by mathematical means; hence in many cases the approaches 
of a mathematician and a physicist to a problem are very similar. The main difference is that physicists 
try to express the results of their analysis in terms of system 's motion rather than function properties, 
and as a result develop some sort of intuition ("gut feeling") about how other, apparently similar, 
systems may behave, even if their exact equations of motion are somewhat different or not known at all. 
The intuition so developed has an enormous heuristic power, and most discoveries in physics have been 
made through gut-feeling-based insights rather than just plugging one formula into another one. 



1.2. Kinematics: Basic notions 

The basic notions of kinematics may be defined in various ways, and some mathematicians pay a 
lot of attention to analyzing such systems of axioms and relations between them. In physics, we 
typically stick to less rigorous ways (in order to proceed faster to particular problems), and end debating 
a definition as soon as everybody in the room agrees that we are all speaking about the same thing. Let 
me hope that the following notions used in classical mechanics do satisfy this criterion: 

(i) All the Euclidean geometry notions, including the geometric point (the mathematical 
abstraction for the position of a very small object), straight line, etc. 

(ii) Orthogonal, linear ("Cartesian") coordinates 11 rj of a geometric point in a particular reference 
frame - see Fig. I. 4 



point 




■y 



Fig. 1.1. Cartesian coordinates and 
radius-vector of a point/particle. 



The coordinates may be used to define point's radius-vector 

3 

7=1 

where nj,n 2 ,n 3 are unit vectors along coordinate axis directions, with the Euclidean metric: 



Radius 
\i.Jj -vector 



3 In these notes the Cartesian coordinates are denoted either as either {r b r 2 , r 3 } or {x, y, z}, depending on 
convenience in the particular case. Note that axis numbering is important for operations like vector products; the 
"correct" (meaning generally accepted) numbering order is such that rotation ni — > n 2 — > n 3 — > n\... looks 
counterclockwise if watched from a point with all r, > > 0 - see Fig. 1 . 

4 In references to figures, formulas, problems and sections within the same chapter of these notes, the chapter 
number is dropped. 
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Euclidean 
metric 



Acceleration 




(1.4) 



which is independent, in particular, of the distribution of matter in space. 

(iii) Time - as described by a continuous scalar variable (say, t), typically considered an 
independent argument of various physical observables, in particular points radius-vector r(t). By 
accepting Eq. (4), and an implicit assumption that time t runs similarly in all reference frames, we 
subscribe to the notion of the absolute ("Newtonian") space/time, and hence abstain from a discussion 
of relativistic effects. 5 



(iv) (Instant) velocity of the point, 



Velocity 



and its acceleration: 




(1.5) 



(1.6) 



Since the above definitions of vectors r, v, and a depend on the chosen reference frame (are 
"reference-frame-specific"), there is a need to relate those vectors as observed in different frames. 
Within the Euclidean geometry, for two reference frames with the corresponding axes parallel in the 
moment of interest (Fig. 2), the relation between the radius-vectors is very simple: 



Radius-vector 
transformation 



in O' 



= r 



w+ r o 



(1.7) 




Fig. 1.2. Coordinate transfer between two 
reference frames. 



If the frames move versus each other by translation only (no mutual rotation!), similar relations 
are valid for velocity and acceleration as well: 

V |inO' = V |inO + V o|inO' ' (1-8) 

a Lo' =a| in0 +a 0 | in0 , . (1.9) 



5 Following tradition, an introduction to special relativity is included into the Classical Electrodynamics ("EM") 
part of these notes. The relativistic effects are small if all particles velocities are much lower than the speed of 
light, c « 3xl0 8 m/s, and all distances are much larger then the system's Schwarzschild radius r s = IGmlc 2 , where 
G « 7xl0" n SI units is the Newtonian gravity constant, and m is system's mass. (More exact values of c, G and 
some other physical constants are listed in appendix CA: Selected Physical Constants.) 
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In the case of mutual rotation of the reference frames, notions like \o \ mO' are not well defined. 
(Indeed, different points of a rigid body connected to frame O may have different velocities in frame 
O'.) As a result, the transfer laws for velocities and accelerations are more complex than those given by 
Eqs. (8) and (9). It will be more natural for me to discuss them in the end of Chapter 5 that is devoted to 
rigid body motion. 

(v) Particle: a localized physical object whose size is negligible, and shape unimportant for the 
given problem. Note that the last qualification is extremely important. For example, the size and shape 
of a Space Shuttle are not too important for the discussion of its orbital motion, but are paramount when 
its landing procedures are developed. Since classical mechanics neglects the quantum mechanical 
uncertainties, 6 particle's position, at any particular instant t, may be identified with a single geometric 
point, i.e. one radius-vector r(t). Finding the laws of motion r(t) of all particles participating in the given 
problem is frequently considered the final goal of classical mechanics. 



1.3. Dynamics: Newton laws 



Generally, the classical dynamics is fully described (in addition to the kinematic relations given 
above) by three Newton laws. 1 In contrast to the impression some textbooks on theoretical physics try to 
create, these laws are experimental in nature, and cannot be derived from purely theoretical arguments. 8 

I am confident that the reader of these notes is already familiar with the Newton laws, in one or 
another formulation. Let me note only that in some formulations the 1 st Newton law looks just as a 
particular case of the 2 nd law - for the case of zero net force acting on a particle. In order to avoid this 
duplication, the 1 st law may be formulated as the following postulate: 

- There exists at least one reference frame, called inertial, in which any free particle (i.e. a 1 s ' Newton 
particle isolated from the rest of the Universe) moves with v = const, i.e. with a = 0. 

According to Eq. (9), this postulate immediately means that there is also an infinite number of 
inertial frames, because all frames O' moving without rotation or acceleration relative to the postulated 
inertial frame O (i.e. having ao I mO , = 0) are also inertial. 



law 



in O' 
,nd 



On the other hand, the 2 nQ and 3 rd Newton laws may be postulated together in the following 
elegant way. Each particle, say with number k, may be characterized by a scalar constant (called mass 
mk), such that at any interaction of N particles (isolated from the rest of the Universe), in any inertial 
system, 



= const. 



(1.10) 



Total 

momentum 
and its 
conservation 



(Each component of this sum, 



6 This approximation is legitimate, crudely, when the product of scales of coordinate and momentum of the 
particle motion is much larger than the Planck's constant h « 1.054xl0" 34 J-s. A more exact formulation may be 
found, e.g., in the Quantum Mechanics ("QM") part of these notes. 

7 Due to the genius of Sir Isaac Newton, these laws were formulated as early as in 1687, far ahead of the science 
of that time. 

8 Some laws of Nature (including the Newton laws) may be derived from certain more general postulates, such as 
the Hamilton (or "least action") principle - see Sec. 10.2 below. Note, however, that such derivation is only 
acceptable because all known corollaries of the postulates comply with all known experimental results. 
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Particle's 
momentum 



P/, 



(1.11) 



is called the mechanical momentum of the corresponding particle, and the whole sum P, the total 
momentum of the system.) 

Let us apply this postulate to just two interacting particles. Differentiating Eq. (10), written for 
this case, over time, we get 

Pi=-P 2 (1.12) 

Let us give the derivative p t (i.e., a vector) the name of force Fi excerted on particle 1. In our current 
case, when the only possible source of force is particle 2, the force may be denoted as F12. Similarly, 



F 21 = p 2 , so that we get the 3 rd Newton law 



3rd Newton 
law 



F 12 - _F 21 



(1.13) 



Now, returning to the general case of several interacting particles, we see that an additional (but 
very natural) assumption that all partial forces F/&> acting on particle k add up as vectors, leads to the 
general form of the 2 nd Newton law 9 



2nd Newton 
law 



777 



=P* =Z F «' = F * = 



(1.14) 



that allows a clear interpretation of the mass as a measure of particle's inertia. 

As a matter of principle, if the dependence of all pair forces F^- of particle positions (and 
generally maybe of time as well) is known, Eq. (14) augmented with kinematic relations (4) and (5), 
allows the calculation of the laws of motion x k {i) of all particles of the system. For example, for one 
particle the 2 nd law (14) gives the ordinary differential equation of the second order, 

mr = F(r,0, (1.15) 

that may be integrated - either analytically or numerically. 

For certain cases, this is very simple. As an elementary example, for the motion of a particle in a 
uniform gravitational field (say, that of the Earth near its surface), the Newton's gravity law may be 
approximated as 



Uniform 
gravity field 



F = mg, 



(1.16) 



with vector g being constant (e.g., directed toward Earth' center), and mass 777 the same as in Eq. (14). 10 
As a result, m cancels, and Eq. (15) is reduced to just r = g and may be easily integrated twice: 



1 

■(t) = v(0 = Jg df + v(0) = gt + v(0), 



(1.17) 



9 Of course, for composite bodies of varying mass (e.g., rockets emitting jets, see Problem 1), momentum's 
derivative may differ from ma. 

10 The last fact, the so-called weak equivalence principle, is highly nontrivial, but has been verified 
experimentally to the relative accuracy of at least 10" 13 . Due to the conceptual significance of this fact, new space 
experiments, such as MISCROSCOPE ( http ://smsc . cnes . fr/MICRO S COPE/ ), are being planned for a substantial, 
nearly 1 00-fold improvement of the accuracy. 
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r(0= fv(O*' + r(0) = g— + v(0)f + r(0), (1.18) 

0 2 

thus giving the solution of all those projectile motion problems that should be so familiar to the reader. 

All this looks (and indeed is) very simple, but in most other cases leads to more complex 
calculations. As an example, let us consider another simple problem: a bead of mass m sliding, without 
friction, along a round ring of radius R in a gravity field obeying Eq. (16) - see Fig. 3. 




position, 
v = ? 



initial 
position, 
v = 0 

intermediate 
position 

Fig. 1.3. Bead moving on a vertical ring. 
mg & & & 



Suppose we are only interested in bead's velocity v in the lowest point, after it has been dropped 
from the rest at the rightmost position. If we want to solve this problem using only the Newton laws, we 
have to do the following steps: 

(i) consider the bead in an arbitrary intermediate position on a ring, described, for example by 
angle 6 (Fig. 3); 

(ii) draw all the forces acting on the particle - in our current case, the gravity force mg and the 
reaction force N exerted by the ring; 

(iii) write the 2 nd Newton law for two nonvanishing components of the bead acceleration, say for 
its vertical and horizontal components a x and a y ; 

(iv) recognize that in the absence of friction, force N should be normal to the ring, so that we can 
use two additional equations, N x = -N sm.0 and N y = N cos 0 ; 

(v) eliminate unknown variables N, N x , and N y from the resulting system of four equations, thus 
getting a single second-order differential equation for one variable, for example 0, 

(vi) integrate this equation once to get the expression relating velocity 0 and angle 0; and, 

finally 

(vii) using our specific initial condition ( # = 0 at 0 = n 12), find the final velocity as v = R 0 at 

0 = 0. 

All this is very much doable, but please agree that the procedure it too cumbersome for such a 
simple problem. Moreover, in many other cases even writing equations of motion along relevant 
coordinates is very complex, and any help the general theory may provide is highly valuable. In many 
cases, such help is given by conservation laws; let us review the most general of them. 
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1.4. Conservation laws 



(i) Energy conservation is arguably the most general law of physics, but in mechanics it takes a 
more humble form of mechanical energy conservation that has limited applicability. To derive it, we 
first have to define the kinetic energy of a particle as 



Kinetic 
energy 



T = — v l 
2 



and then recast its differential as 11 



dT = d 



— v 
2 



V 



= d 



m 



-v • V 



= my -dx = m 



dx ■ dx dp 



dt 



dt 



■ dr. 



(1.19) 



(1.20) 



Now plugging in the momentum's derivative from the 2 nd Newton law, dp/dt = F, where F is the full 
force acting on the particle, we get relation dT= F-dr. Its integration along particle's trajectory between 
some points A and B gives the relation that is sometimes called the work-energy principle: 



Energy- 
work 
principle 



AT = T(r B )-T(r A ) = jF-dr 



(1.21) 



where the integral in the right-hand part is called the work of force F on the path from A to B. 



The further step may be made only for potential (also called "conservative") forces that may be 
presented as (minus) gradients of some scalar function U(r), called the potential energy. 12 The vector 
operator V (called either del or nabla) of spatial differentiation 13 allows a very compact expression of 
this fact: 



Potential 
energy 



-VI/. 



For example, for the uniform gravity field (16), 

U = mgh + const, 

where h is the vertical coordinate directed "up" (from the Earth's center). 



(1.22) 



(1.23) 



Integrating the tangential component F T of the vector F, given by Eq. (22), along an arbitrary 
path connecting points A and B, we get 



1) L> 

\F t dr = \¥-dr = U{T A )-U(r B ), 



(1.24) 



i.e. work of potential forces may be presented as the difference of values of function U(r) in the initial 
and final point of the path. (Note that according to Eq. (24), work of a potential force on any closed 
trajectory, with r A = r B , is zero.) 



11 Symbol ab denotes the scalar (or "dot-") product of vectors a and b - see, e.g., MA Eq. (7.1). 

12 Note that because of its definition via the gradient, the potential energy is only defined to an arbitrary additive 
constant. 

13 Its basic properties are listed in MA Sec. 8. 



Chapter 1 



Page 7 of 12 



Essential Graduate Physics 



CM: Classical Mechanics 



Now returning to Eq. (21) and comparing it with Eq. (24), we see that 

T(t b )-T(t a ) = U(t a )-U(t b ), 
so that the total mechanical energy E, defined as 



is indeed conserved: 



E = T + U. 



E(x A ) = E(r B ) 



(1.25) 



(1.26) 



(1.27) 



Total 

mechanical 
energy 

Mechanical 

energy 

conservation 



but for conservative forces only. (Non-conservative forces, e.g., friction, typically transfer energy from 
the mechanical form into some other form, e.g., heat.) 

The mechanical energy conservation allows us to return for a second to the problem shown in 
Fig. 3 and solve it in one shot by writing Eq. (27) for the initial and final points: 14 



77? 



0 + mgR = —v z +0. 



(1.28) 



Solving Eq. (28) for v immediately gives us the desired answer. Let me hope that the reader agrees that 
this way of problem solution is much simpler, and I have got his or her attention to discuss other 
conservation laws - which may be equally effective. 

(ii) Momentum . Actually, the conservation of the full momentum of any system of particles 
isolated from the rest of the world, has already been discussed and may serve as the basic postulate of 
classical dynamics - see Eq. (10). In the case of one free particle the law is reduced to a trivial result p = 



const, i.e. v = const. If the system of N particles is affected by external forces F (ext) , we may write 



_ T?(ext) 



(1.29) 



If we sum up the resulting Eqs. (14) for all particles of the system then, due to the 3 rd Newton law (13), 
the contributions of all internal forces to this double sum in the right-hand part cancel, and we get the 
equation 



P = F 



(CXI) 



where F 



(ext) 



k=] 



(ext) 



(1.30) 



System's 

momentum 

evolution 



which tells us that the translational motion of the system as the whole is similar to that of a single 
particle, under the effect of the net external force F (ext) . As a simple sanity check, if the external forces 
have a zero sum, we return to postulate (10). Just one reminder: Eq. (30), just as its precursor Eq. (14), 
is only valid in an inertial reference frame. 



(iii) Angular momentum of a particle 15 is defined as the following vector: 



Lsrxp, 



(1.31) 



Angular 

momentum: 

definition 



14 Here the arbitrary constant in Eq. (32) is chosen so that the potential energy is zero in the finite point. 

15 Here we imply that the internal motions of the particle, including its rotation about its own axis, are negligible. 
(Otherwise it could not be represented by a geometrical point, as was postulated in Sec. 1.) For a body with 
substantial rotation (see Chapter 6 below), vector L retains its definition (32), but is only a part of the total 
angular momentum and is called the orbital momentum - even if the particle does not move along a closed orbit. 
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where axb means the vector (or "cross-") product of the vector operands. 16 Now, differentiating Eq. (31) 
over time, we get 



Angular 
momentum: 
conservation 



L = rxp + rxp. 



(1.32) 



In the first product, r is just the velocity vector v which is parallel to the particle momentum p = mv, so 
that this product vanishes, since the vector product of any two parallel vectors is zero. In the second 
product, p equals the full force F acting on the particle, so that Eq. (32) is reduced to 



Angular 
momentum: 
evolution 



Torque 



L = T, 



where vector 



t = rxF, 



(1.33) 



(1.34) 



is called the torque of force F. (Note that the torque is evidently reference-frame specific - and again, 
the frame has to be inertial for Eq. (33) to be valid.) For an important particular case of a central force 
F that is parallel to the radius vector r of a particle (as measured from the force source point), the torque 
vanishes, so that (in that particular reference frame only!) the angular momentum is conserved: 

(1.35) 



L = const. 



For a system of N particles, the total angular momentum is naturally defined as 

N 

k=l 



(1.36) 



Differentiating this equation over time, using Eq. (33) for each L /c , and again partitioning each force in 
a accordance with Eq. (29), we get 

N N 

L = x F tt , + x (ext) , where x (ext) = £r t x Ff xt) . (1.37) 



k'*k 



k = \ 



The first (double) sum may be always divided into pairs of the type (i>xF#f + rvxF*'*). Each of these 
pairs equals zero. Indeed, both components of the pair are vectors perpendicular to the plane passed 
through positions of both particles and the reference frame origin, i.e. to the plane of drawing of Fig. 4. 
Also, due to the 3 Newton law (13) the two forces are equal and opposite, and the magnitude of each 
term in the sum may be presented as I Fm\ hkk; with equal "lever arms" hw = hfk. 




Fig. 1.4. Internal and external forces, and 
the internal torque cancellation in a system 
of two particles. 



16 See, e.g.,MAEq. (7.3). 
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As a result, each sum (r^xF^ + r^xF^), and hence the whole double sum in Eq. (37) vanish, 
and it is reduced to a very simple result, 



L = T 



(ext) 



System' s 

/ 1 oox an 9 ular 
^l.JOJ momentum 



that is similar to Eq. (33) for a single particle, and is the angular analog of Eq. (30). In particular, Eq. 
(38) shows that if the full external torque x (ext) vanishes by some reason (e.g., if the system of particles is 
isolated from the rest of the Universe), the conservation law (35) is valid for the full angular momentum 
L, even if its individual components L# are not conserved due to inter-particle interactions. 

From the mathematical point of view, most conservation laws present the first integrals of 
motion which sometimes liberate us from the necessity to integrate the second-order differential 
equations of motion, following from the Newton laws, twice. 



evolution 



1.5. Potential energy and equilibrium 

Another important role of the potential energy U, especially for dissipative systems whose total 
mechanical energy E is not conserved because it may be drained to the environment, is finding the 
positions of equilibrium (sometimes called the fixed points of the system under analysis) and analyzing 
their stability with respect to small perturbations. For a single particle, this is very simple: force (22) 
vanishes at each extremum (minimum or maximum) of the potential energy. 17 Of those fixed points, 
only the minimums of U(r) are stable - see Sec. 3.2 below for a discussion of this point. 

A slightly more subtle case is a particle with potential energy U(r), subjected to an additional 
external force F^ ext '(r). In this case, the stable equilibrium is reached at the minimum of not function 
U(r), but of the Gibbs potential energy 

U G {r) = U{r)-]^{r'\dr', 

which is defined, just as U(r) is, to an arbitrary constant. The proof of Eq. (39) is very simple: in an 
extremum of this function, the total force acting on the particle, 

r 

F (tot) =F + F (ext) =-VU + V |F (ext) (r ')-dr' = -VU G , (1.40) 

vanishes, as it should. 18 For the simplest (and very frequent) case of the applied force independent on 
particle's position, the Gibbs potential energy is just 



Gibbs' 
(1.39) potential 
energy 



17 Assuming that the additional, non-conservative forces (such as viscosity) responsible for the mechanical energy 
drain, vanish at equilibrium - as they typically do. (Static friction is one counter-example.) 

18 Physically, the difference Uq-U specified by Eq. (39) may be considered the r-dependent part of the potential 
energy lf ext) of the external system responsible for force F (ext) , so that Uq is just the total potential energy U + 
lf ext) , besides the part of lf ext} which does not depend on r and hence is irrelevant for the fixed point analysis. 
According to the 3 rd Newton law, the force exerted by the particle on the external system equals (-F (ext) ), so that 
its work (and hence the change of lf ext) due to the change of r) is given by the second term in the right-hand part 
of Eq. (39). Thus the condition of equilibrium, -VUq = 0, is just the condition of an extremum of the total 
potential energy, U + lf ext \ of the two interacting systems. 
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U G (r) = U(r)- F (ext) • r + const . (1 .41) 

This is all very straightforward, but since the notion of Uq is not well known to some students, 19 
let me offer a very simple example. Consider a ID deformation of the usual elastic spring providing the 
returning force {-kx), where x is the deviation from spring's equilibrium. In order for the force to comply 
with Eq. (22), its potential energy should equal to U = kx 2 I2 + const, so that its minimum corresponds to 
x = 0. This works fine until the spring comes under effect of a nonvanishing external force F, say 
independent of x. Then the equilibrium deformation of the spring, xo = Fix, evidently corresponds not to 
the minimum of U but rather to that of the Gibbs potential energy (41): Uq = U - Fx = kx 12 - Fx + 
const. 



1.6. OK, we've got it - can we go home now? 

Not yet. In many cases the conservation laws discussed above provide little help, even in 
systems without dissipation. Consider for example a generalization of the bead-on-the-ring problem 
shown in Fig. 3, in which the ring is rotated by external forces, with a constant angular velocity co, 
about its vertical diameter (Fig. 5). 




Fig. 1.5. Bead sliding along a rotating ring. 



In this problem (to that I will repeatedly return below, using it as an analytical mechanics 
testbed), none of the three conservation laws listed in the last section, holds. In particular, bead's energy, 

E = ^-v 2 +mgh, (1.42) 

is not constant, because the external forces rotating the ring may change it. Of course, we still can solve 
the problem using the Newton laws, but this is even more complex than for the above case of the ring at 
rest, in particular because the force N exerted on the bead by the ring now may have three rather than 
two Cartesian components, which are not simply related. 



19 Unfortunately, in most physics teaching plans the introduction of Uq is postponed until a course of statistical 
mechanics and/or thermodynamics - where it is a part of the Gibbs free energy, in contrast to U, which is a part of 
the Helmholtz free energy - see, e.g., Sec. 1.4 of the Statistical Mechanics ("SM") part of my notes. However, the 
reader should agree that the difference between Ug and U, and hence that between the Gibbs and Helmholtz free 
energies, has nothing to do with statistics or thermal motion, and belongs to the basic mechanics. 
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One can readily see that if we could exclude the so-called reaction forces such as N, that ensure 
external constraints of the particle motion, in advance, that would help a lot. Such an exclusion may be 
provided by analytical mechanics, in particular its Lagrangian formulation, to which we will now 
proceed. 20 



1.7. Exercise problems 
(for reader's background self-check) 

1.1 . Find the acceleration of a rocket due to the working jet motor, and explore the resulting 
equation of rocket's motion. 

Hint: For the sake of simplicity, you may consider the ID motion. 



1.2 . A satellite of mass m is being launched from height H over the 
surface of a spherical planet with radius R and mass M» m - see Fig. on the 
right. Find the range of initial velocities v 0 (normal to the radius) providing 
closed orbits above the planet's surface. 



1.3 . Derive the differential equations of motion for small oscillations 
of two similar pendula coupled with a spring (see Fig. on the right), within the 
vertical plane. Assume that at the vertical position of both pendula, the spring 
is not stretched (AL = 0). 




^^^^ 



I 



/ 



m 



F = -jcAL 



/ 



m 



20 An even more important motivation for analytical mechanics is given by dynamics of "non-mechanical" 
systems, for example, of the electromagnetic field - possibly interacting with charged particles, conducting 
bodies, etc. In many such systems, the easiest way (and sometimes the only practicable way) to find the equations 
of motion is to derive then from the Lagrangian or Hamiltonian function of the system. In particular, the 
Hamiltonian formulation of the analytical mechanics (to be discussed in Chapter 10) offers a direct pathway to 
deriving Hamiltonian operators of systems, which is the standard entry point for analysis of their quantum- 
mechanical properties. 
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Chapter 2. Lagrangian Formalism 

The goal of this chapter is to describe the Lagrangian formulation of analytical mechanics, which is 
extremely useful for obtaining the differential equations of motion {and sometimes their first integrals) 
not only for mechanical systems with holonomic constraints, but also other dynamic systems. 

2.1. Lagrange equations 

In many cases, the constraints imposed on 3D motion of a system of N particles may be 
described by N vector (i.e. 3N scalar) algebraic equations 



r k (q l ,q 2 ,...,q j ,...,q J ,t), l<k<N, 



(2.1) 



where qj are certain generalized coordinates which (together with constraints) completely define the 
system position, and J < 3N is the number of the actual degrees of freedom. The constraints that allow 
such description are called holonomic. 1 

For example, for our testbed, bead-on-rotating-ring problem (see Fig. 1.5 and Fig. 1 below) J = 
1, because taking into account the constraints imposed by the ring, bead's position may be uniquely 
determined by just one generalized coordinate - for example, the polar angle 6 . Indeed, selecting the 
reference frame as shown in Fig. 1 and using the well-known formulas for the spherical coordinates, 2 we 
see that in this case Eq. (1) in Cartesian coordinates has the form 

r = {x,y,z}={Rsm6cos(p,Rsm6sm(p,- RcosO], where q> = cot + const , (2.2) 

where the constant depends on the exact selection of axes x and y and the time origin. Since (fit) is a 
fixed function of time, and R is a fixed constant, the position of particle in space at any instant t is 
indeed completely determined by the value of its only generalized coordinate 6. Note that the 
dimensionality of the generalized coordinate may be different from that of Cartesian coordinates 
(meters)! 
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Fig. 2.1. Bead on a rotating ring as a 
example of the system with just one degree 
of freedom: 7=1. 



1 Possibly, the simplest example of a non-holonomic constraint is a set of inequalities describing the hard walls 
confining the motion of particles in a closed volume. Non-holonomic constraints are better dealt with other 
methods, e.g., by imposing proper boundary conditions on the (otherwise unconstrained) motion. 

2 See, e.g., MA Eq. (10.7). 
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Now returning to the general case of J degrees of freedom, let us consider a set of small 
variations (alternatively called "virtual displacements") Sqj allowed by the constraints. Virtual 
displacements differ from the actual small displacements (described by differentials dqj proportional to 
time variation dt) in that Sqj describes not the system's motion as such, but rather its possible variation - 
see Fig. 1. 



possible 
motion 




actual 
motion 



Fig. 2.2. Actual displacement dqj vs. the 
virtual one (i.e. variation) Sqj. 



Generally, operations with variations are the subject of a special field of mathematics, the 
calculus of variations. 3 However, the only math background necessary for our current purposes is the 
understanding that operations with variations are similar to those with the usual differentials, though we 
need to watch carefully what each variable is a function of. For example, if we consider the variation of 
the radius-vectors (1), at a fixed time t, as a function of independent variations Sqj, we may use the usual 
formula for the differentiation of a function of several arguments: 4 

Sr k =Z^j- (2-3) 

Now let us break the force acting upon the fc-th particle into two parts: the frictionless, 
constraining part N* of the reaction force and the remaining part F k - including the force components 
from other sources and possibly the friction part of the reaction force. Fhen the 2 nd Newton law for k-th 
particle of the system may be presented as 

m t v t -F t =N t . (2.4) 

Since any variation of the motion has to be allowed by the constraints, its 3Af-dimensional vector with N 
3D-vector components <5r* has to be perpendicular to the 3iV-dimensional vector of the constraining 
forces, also with ./V 3D-vector components r% (For example, for the problem shown in Fig. 2.1, the 
virtual displacement vector b\\ may be directed only along the ring, while the constraining force N, 
exerted by the ring, has to be perpendicular to that direction.) This condition may be expressed as 

0, (2.5) 



3 For a concise introduction to the field see, e.g., I. Gelfand and S. Fomin, Calculus of Variations, Dover, 2000 or 
L. Elsgolc, Calculus of Variations, Dover, 2007. An even shorter review may be found in Chapter 17 of Arfken 
and Weber - see MA Sec. 16. For a more detailed discussion, using many examples from physics, see R. 
Weinstock , Calculus of Variations, Dover, 2007. 

4 See, e.g., MA Eq. (4.2). In all formulas of this section, all summations over index j are from 1 to J, while those 
over the particle number k are from 1 to N. 
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where the scalar product of 3A^-dimensional vectors is defined exactly as that of 3D vectors, i.e. as the 
sum of the products of the corresponding components of the operands. The substitution of Eq. (4) into 
Eq. (5) results in the so-called D'Alembert principle: 5 



D'Alembert 
principle 



ZKv t -F ( ).* t =0. 



(2.6) 



Now we may plug Eq. (3) into Eq. (6) to get 

ill 



dr t 



where scalars ?$ , called generalized forces, are defined as follows: 6 



(2.7) 



(2.8) 



Now we may use the standard argument of the calculus of variations: in order for the left-hand 
part of Eq. (7) to be zero for an arbitrary selection of independent variations dqj, the expressions in the 
curly brackets, for every j, should equal zero. This gives us a set of J equations 



5r 



dqj 



(2.9) 



let us present them in a more convenient form. First, using the differentiation by parts to calculate the 
following time derivative: 



d_ 
dt 



dqj 



= V, 



dqj 



dr, 



(2.10) 



we may notice that the first term in the right-hand part is exactly the scalar product in the first term of 
Eq.(9). 

Second, let us use another key fact of the calculus of variations (which is, essentially, evident 
from Fig. 3): the differentiation of a variable over time and over the generalized coordinate variation (at 
fixed time) are interchangeable operations. 




S(df) = d(Sf) 



Fig. 2.3. Variation of the differential (of any 
function f) equals the differential of its 
variation. 



5 It had been spelled out in a 1743 work by J.-B. le Rond d'Alembert, though the core of this result has been 
traced to an earlier work by J. Bernoulli (1667 - 1748). 

6 Note that since the dimensionality of generalized coordinates may be arbitrary, that of generalized forces may 
also differ from the newton. 
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As a result, in the second term on the right-hand part of Eq. (10) we may write 



d_ 
dt 



d 



dt 



8qj 



(2.11) 



Finally, let us differentiate of Eq. (1) over time: 



* dt jdq/ j 8t 



(2.12) 



This equation shows that particle velocities v* may be considered as linear functions of the generalized 
velocities q } considered as independent variables, with proportionality coefficients 



8v. dr. 



dq. dqj 

With the account of Eqs. (10), (11), and (13), Eq. (9) turns into 



(2.13) 



— y_, m k y k y_, m k y 

dt k dq } k 



8 qj 



This result may be further simplified by making, for the total kinetic energy of the system, 



k j 



(2.14) 



(2.15) 



k z z * 

the same commitment as for Vfe i.e. considering T a function of not only the generalized coordinates qj 
and time t, but also of the generalized velocities 4, - a s variables independent of qj and t. Then we may 

calculate the partial derivatives of T as 

dT 



&*k 

oqj k oqj 



(2.16) 



and notice that they are exactly the two sums participating in Eq. (13). As a result, we get a system of J 
Lagrange equations, 1 



d dT dT 



dt ddj dqj 



?j=0, for j = 1,2,..., J 



General 
(2.17) Lagrange 
equations 



Their big advantage over the initial Newton law equations (4) is that the Lagrange equations do not 
include the constraining forces Nk. 

This is as far as we can go for arbitrary forces. However, if all the forces may be expressed in the 
form similar but somewhat more general than Eq. (1.31), F* = -V*£/(ri, r 2 ,...,r N , i), where U is the 



7 They were derived in 1788 by J.-L. Lagrange who pioneered the whole field of analytical mechanics - not to 
mention his key contributions to number theory and celestial mechanics. 
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effective potential energy of the system, 8 and sign V* denotes differentiation over coordinates of A:-th 
particle, we may recast Eq. (8) into a simpler form: 



-I 



DU dx^ + dU_ 5y jL+ dU_ dzj_ 
8x k dqj dy k dq j dz t dq . 



8U 
d qj 



(2.18) 



Since we assume that U depends only on particle coordinates (and possibly time), but not velocities, 
dU /dqj =0, with the substitution of Eq. (18), the Lagrange equation (17) may be presented in its 

canonical form 



Canonical 
Lagrange 
equations 



Lagrangian 
function 



d 8L 8L 



dt dqj dqj 



0, where L = T -U. 



where L is called the Lagrangian function (or just the "Lagrangian"), defined as 



L = T-U 



(2.19a) 



(2.19b) 



It is crucial to distinguish this function from the mechanical energy (1.26), E= T + U. 

Using the Lagrangian formalism in practice, the reader should always remember that: 

(i) Each system has only one Lagrange function L, but is described by J >1 Lagrange equations 
of motion (for j = 1, 2,..., J). 

(ii) Differentiating T, we have to consider the generalized velocities q } as independent variables, 
ignoring the fact they are actually the time derivatives of q } . 



2.2. Examples 

As the first, simplest example, consider a particle constrained to move along one axis (say, x): 



r = y* 2 , U = U(x,t). 



(2.20) 



In this case, it is natural to consider x as the (only) generalized coordinate, and x as the generalized 
velocity, so that 



m 
2 



L = T-U = —x 2 -U(x,t). 



(2.21) 



Considering x an independent variable, we get dL/dx = mx, and dLldx = -dU/dx, so that the 
Lagrange equation of motion (only one equation in this case of the single degree of freedom!) yields 

d, ( dU^ 

— [mx )- 

dt v dx 



= 0, 



(2.22) 



8 Note that due to the possible time dependence of U, Eq. (17) does not mean that forces F t have to be 
conservative - see the next section for more discussion. With this understanding, I will still use for function U the 
convenient name of "potential energy". 
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evidently the same result as the ^-component of the 2 nd Newton law with F x = -dU/dx. This is a good 
sanity check, but we see that the Lagrange formalism does not provide too much advantage in this 
particular case. 

This advantage is, however, evident for our testbed problem - see Fig. 1 . Indeed, taking the polar 
angle 0 for the (only) generalized coordinate, we see that in this case the kinetic energy depends not 
only on the generalized velocity, but also on the generalized coordinate: 9 



T = ^ R 2 (& 2 + co 2 sin 2 U = mgh + const = -mgR cos 0 + const, 

L = T -U =^R 2 (6 2 +a> 2 sin 2 6)+ mgR cos 9 + const. 



(2.23) 



Here it is especially important to remember that at substantiating the Lagrange equation, 6 and 6 have 
to be treated as independent arguments of L, so that 



80 DO 
giving us the following equation of motion: 

d 



8L 2 ■ 2 2 

- = mR 6, — = mR co sin 6* cos 6* -mgi? sin 6*, 



dt 



(mR 2 O) - (mR 2 co 2 sin 6 cos 0 - mgR sin O) = 0 . 



(2.24) 



(2.25) 



As a sanity check, at co = 0, Eq. (25) is reduced to the correct equation of the usual pendulum: 

f „\ 1/2 



0 + n 2 sm0 = O, whereas 



(2.26) 



We will explore the full dynamic equation (25) in more detail later, but please note how simple its 
derivation was - in comparison with writing the Newton laws and then excluding the reaction force. 

Next, though the Lagrangian formalism was derived from the Newton law for mechanical 
systems, the resulting equations (19) are applicable to other dynamic systems, especially those for which 
the kinetic and potential energies may be readily expressed via some generalized coordinates. As the 
simplest example, consider the well-known connection (Fig. 4) of a capacitor with capacitance C to an 
inductive coil with self-inductance L. 10 (Electrical engineers frequently call it the LC tank circuit.) 



+ 



Q 



c 



V 



v 



Fig. 2.4. LC tank circuit. 



9 This expression for T = (ml 2)(x 2 + y 2 + z 2 ) may be readily obtained either by the formal differentiation of Eq. 
(2) over time, or just by noticing that the velocity vector has two perpendicular components: one along the ring 
(with magnitude R6) and another one normal to the ring plane (with magnitude cop = coRsmO - see Fig. 1). 

10 Let me hope that this traditional notation would not lead to the confusion between the inductance and the 
Lagrange function. 
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As the reader certainly knows, at relatively low frequencies we may use the so-called lumped- 
circuit approximation, in which the total energy of the system as the sum of two components, the electric 
energy E c localized inside the capacitor, and the magnetic energy E L localized inside the inductance coil 

n 2 j j 2 

£ c =— , E L = . (2.27) 

c 2C 2 

Since the electric current I through the coil and the electric charge Q on the capacitor are connected by 
the charge continuity equation dQ/dt = I (evident from Fig. 4), it is natural to declare the charge a 
generalized coordinate, and the current, the generalized velocity. With this choice, the electrostatic 
energy E c (Q) should may be treated as the potential energy U of the system, and the magnetic energy 
E L (I), as its kinetic energy T. With this attribution, we get 

Dq 81 3q 8Q 3q 8Q C 

so that the Lagrange equation of motion is 



£te)-(-§) = 0. (2.29) 



Note, however, that the above choice of the generalized coordinate and velocity is not unique. 
Instead, one can use as the generalized coordinate the magnetic flux O through the inductive coil, 
related to the common voltage V across the circuit (Fig. 4) by Faraday's induction law V= - dQ/dt. With 
this choice, (-V) becomes the generalized velocity, E L = O I2L should be understood as the potential 
energy, and E c = CV 2 /2 treated as the kinetic energy. It is straightforward to verify that for this choice, 
the resulting Lagrange equation of motion is equivalent to Eq. (29). If both parameters of the circuit, L 
and C, are constant in time, Eq. (29) is just the harmonic oscillator equation similar to Eq. (1.1), and 
describes sinusoidal oscillations with frequency 

This is of course a very well known result that may be derived in the more standard way by 
equating the voltage drops across the capacitor (V = QIC) and the inductor (V = -Ldl/dt = -Ld 2 Q/dt 2 ). 
However, the Lagrangian approach is much more convenient for more complex systems, for example, 
for the description of electromagnetic field and its interaction with charged relativistic particles. 11 



2.3. Hamiltonian function and energy 

The canonical form (19) of the Lagrange equation has been derived using Eq. (18), which is 
formally similar to Eq. (1.22) for a potential force. Does this mean that the system described by Eq. (19) 
always conserves energy? Not necessarily, because the "potential energy" U, that participates in Eq. 
(18), may depend not only on the generalized coordinates, but on time as well. Let us start the analysis 
of this issue with the introduction of two new (and very important!) notions: the generalized momenta 
corresponding to each generalized coordinate qj, 



"See, e.g., EM Sec. 9.8. 
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and the Hamiltonian function 12 



BL 



j v Hj j 



(2.31) 



(2.32) 



Generalized 
momentum 



Hamiltonian 
function 



In order to see whether the Hamiltonian function is conserved, let us differentiate its definition 
(32) over time: 



dH 

dt 



d_ 
dt 



r dL^ 



dL .. 

H <li 



dL 
dt ' 



(2.33) 



If we want to make use of the Lagrange equation (19), the last derivative has to be calculated 
considering L as a function of independent arguments q. , q^ , and t: 



dL 

dt 



=Z 



f dL . dL .. ^ 

o, H q, 

dq, qj 



+ ■ 



dL 

dt ' 



(2.34) 



where the last term is the derivative of L as an explicit function of time. We see that the last term in the 
square brackets of Eq. (33) immediately cancels with the last term in the parentheses of Eq. (34). 
Moreover, using the Lagrange equation (19) for the first term in the square brackets of Eq. (33), we see 
that it cancels with the first term in the parentheses of Eq. (34). Thus we arrive at a very simple and 
important result: 



dH 


dL 


dt 


dt ' 



(2.35) 



Hamiltonian 

function's 

evolution 



The most important corollary of this formula is that if the Lagrangian function does not depend 
on time explicitly (dL/ dt = 0), the Hamiltonian function is an integral of motion: 



H = const. 



(2.36) 



Let us see how it works, using the first two examples discussed in the previous section. For a ID 
particle, definition (31) of the generalized momentum yields 



dL 
ov 



(2.37) 



so that it coincides with the usual momentum - or rather with its ^-component. According to Eq. (32), 
the Hamiltonian function for this case (with just one degree of freedom) is 



H = px-L = mx 



■ 2 



m . 2 

— x 
2 



■U 



= — x +U 
2 



(2.38) 



12 It is sometimes called just the "Hamiltonian", but it is advisable to use the full term "Hamiltonian function" in 
classical mechanics, in order to distinguish it from the Hamiltonian operator used in quantum mechanics. (Their 
relation will be discussed in Sec. 10.1.) 
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and coincides with particle's mechanical energy E = T + U. Since the Lagrangian does not depend on 
time explicitly, both H and E are conserved. 

However, it is not always that simple! Indeed, let us return again to our testbed problem (Fig. 1). 
In this case, the generalized momentum corresponding to the generalized coordinate 6 is 

p 0 = = m R 2 0, (2.39) 



and Eq. (32) yields: 



2/)2 



h - Pg e-L = mR l e 



R 2 (o 2 + co 2 sin 2 0)+mgRcos0 



+ const 



(2.40) 



= y# 2 (<9 2 -co 2 sin 2 e)-mgR cos 6 + const. 
This means that (as soon as co ^ 0 ), the Hamiltonian function differs from the mechanical energy 

E = T + U =^-R 2 (e 2 +« 2 sin 2 ^-mgtf cos # + const. (2.41) 

2 2 2 

The difference, E - H = mR co sin 6 (besides an inconsequential constant), may change at bead's motion 
along the ring, so that although H is an integral of motion (since dL/dt = 0), energy E is not conserved. 

Let us find out when do these two functions, E and H, coincide. In mathematics, there is a notion 
of a homogeneous function f(x l ,x 2 ,...) of degree X, defined in the following way: for an arbitrary 
constant a , 

f(ax 1 ,ax 2 ,...) = a A f(x l ,x 2 ,...). (2.42) 
Such functions obey the following Euler theorem: 13 

that may be readily proven by differentiating both parts of Eq. (42) over a and then setting this 
parameter to the particular value a = 1 . Now, consider the case when the kinetic energy is a quadratic 
form of all generalized velocities : 

T = ^t jr (q l ,q 2 ,...,t)q j q j „ (2.44) 

jj' 

with no other terms. It is evident that such T satisfies the definition of a homogeneous function of the 
velocities with X = 2, 14 so that the Euler theorem (43) gives 

Y^Lq.=2T. (2.45) 



13 This is just one of many theorems bearing the name of the mathematics genius L. Euler (1707-1783). 

14 Such functions are called quadratic-homogeneous. 



Chapter 2 



Page 9 of 12 



Essential Graduate Physics 



CM: Classical Mechanics 



But since U is independent of the generalized velocities, dL/ dqj =377 dqj , and the left-hand part of 
Eq. (45) is exactly the first term in the definition (32) of the Hamiltonian function, so that in this case 

H =2T-L = 2T-(T-U) = T + U = E. (2.46) 

So, for the kinetic energy of the type (44), for example a free particle with the kinetic energy 
considered as a function of its Cartesian velocities, 

T = ^{v] + vl + vl), (2.47) 

the notions of the Hamiltonian function and mechanical energy are identical. (Indeed, some textbooks, 
very regretfully, do not distinguish these notions at all!) However, as we have seen from our bead-on- 
the-rotating-ring example, this is not always true. For that problem, the kinetic energy, in addition to the 
term proportional to 0 2 , has another, velocity-independent term - see the first of Eqs. (23) - and hence 
is not a quadratic-homogeneous function of the angular velocity. 

Thus, Eq. (36) expresses a new conservation law, generally different from that of the mechanical 
energy conservation. 



2.4. Other conservation laws 

Looking at the Lagrange equation (19), we immediately see that if L = T - U as a whole is 
independent of some generalized coordinate qj, dL/dqj = 0, 15 then the corresponding generalized 
momentum is an integral of motion: 

dL 

p ,. = = const. (2.48) 

dq } 

For example, for a ID particle with Lagrangian (21), momentum p x is conserved if the potential 
energy is constant (the x-component of force is zero) - of course. As a less obvious example, let us 
consider a 2D motion of a particle in the field of central forces. If we use polar coordinates rand <p in 
the role of the generalized coordinates, the Lagrangian function, 16 

L = T-U =^(r 2 +r 2 (p 2 )-U(r), (2.49) 

is independent of q> and hence the corresponding generalized momentum, 

P (p = — = mr 2 (p, (2.50) 
dip 

is conserved. This is just a particular case of the angular momentum conservation - see Eq. (1.24). 
Indeed, for the 2D motion within the [x, y] plane, the angular momentum vector, 



15 Such coordinates are frequently called cyclic, because in some cases (like in the second example considered 
below) they represent periodic coordinates such as angles. However, this terminology is misleading, because 
some "cyclic" coordinates (e.g., x in our first example) have nothing to do with rotation. 

16 Note that here r 2 is just the square of the scalar derivative r , rather than the square of vector r = v. 
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Lsrxp = 



x 

mx 



y 

my 



n 

z 

mi 



(2.51) 



has only one nonvanishing component, perpendicular to the motion plane: 

L z = x(my) - y(mx). 

Differentiating the well-known relations between the polar and Cartesian coordinates, 

* = rcos^, y = rsm(p, 

over time, and plugging the result into Eq. (52), we see that L„ = mr 2 q> = p . 



(2.52) 



(2.53) 



Thus the Lagrangian formalism provides a powerful way of searching for non-evident integrals 
of motion. On the other hand, if such conserved quantity is evident or known a priori, it is helpful for 
the selection of the most appropriate generalized coordinates, giving the simplest Lagrange equations. 
For example, in the last problem, if we have known in advance that p v had to be conserved, this could 
provide a motivation for including the corresponding coordinate, angle cp, into the list of the used 
generalized coordinates. 



2.5. Exercise problems 

In each of Problems 2.1-2.7: 

(i) introduce a set of convenient generalized coordinate(s) qj of the system, 

(ii) write down Lagrangian L as a function of q^q^ , and (if appropriate) time, 

(iii) write down the Lagrangian equation(s) of motion, 

(iv) calculate the Hamiltonian function H; find out whether it is conserved, 

(v) calculate energy E; is E = HI; is energy conserved? 

2.1 . Double pendulum - see Fig. on the right. Consider only the motion 
confined to a vertical plane containing the suspension point. 



I 




2.2 . Stretchable pendulum (i.e. a mass on a spring that exerts force F = -k(1 - l 0 ), Y/y/f/Z/j 
where /rand lo are positive constants), confined to a vertical plane: 




23 . Fixed-length pendulum hanging from a horizontal support whose motion law x 0 (t) 
xo(t) is fixed. (No vertical plane constraint here.) "> - > 



/ 



m 
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2.4 . A pendulum of mass m is hung on another point mass m' that may slide, 
without friction, along a straight horizontal rail (see Fig. on the right). Its motion is 
confined to the vertical plane that contains the rail. 




2.5 . A block of mass m that can slide, without friction, along the 
inclined plane surface of a heavy wedge with mass m'. The wedge is free to 
move, also without friction, along a horizontal surface - see Fig. on the 
right. (Both motions are within the vertical plane containing the steepest 
slope line.) 




2.6 . The system of two spring-coupled pendula that was the subject 
of Problem 1.3 - see Fig. on the right. 



\F = /cAx 



I 



m 6 — / \/\/^ — 6 m 
M 



2.1 . A system of two similar, inductively-coupled LC 
circuits - see Fig. on the right. c 




2.8 . A small Josephson junction, i.e. a system of two superconductors 
coupled by Cooper-pair tunneling through a thin insulating layer that separates 
them (see Fig. on the right). 

Hints: 

(i) At not very high frequencies (whose quantum tico is lower than the binding energy 2A of the 
Cooper pairs), the Josephson effect may be described by coupling energy 

U (<p) = -is j cos (p + const , 

where constant Ej describes the coupling strength, and variable q> (called the Josephson phase 
difference) is related to voltage V across the junction via the famous frequency-to-voltage relation 

dq> _ 2e y 
dt h 

where e « 1.6xl0" 19 C is the fundamental electric charge and h « 1.054xl0" 34 J-s is the Plank constant. 17 

(ii) The junction (as any system of two close conductors) has a certain electric capacitance C. 




17 For more on the Josephson effect and the physical sense of variable q> see, e.g., EM Sec. 6.4 and QM Sees. 2.3 
and 2.8. 



Chapter 2 



Page 12 of 12 



Essential Graduate Physics 



CM: Classical Mechanics 



Chapter 3. A Few Simple Problems 

In this chapter, I will review the solutions of a few simple but very important problems of particle 
motion, that may be reduced to one dimension, including the famous "planetary" problem of two 
particles interacting via a spherically-symmetric potential. In the process, we will discuss several 
methods that will be useful for the analysis of more complex systems. 



3.1. One-dimensional and lD-reducuble systems 

If a particle is confined to motion along a straight line (say, axis x), its position, of course, is 
completely defined by this coordinate. In this case, as we already know, particle's Lagrangian is given 
by Eq. (2.21): 



L = T(x)-U(x,t), T{x) = —x 2 , (3.1) 



so that the Lagrange equation of motion (2.22) 

•• dU(x,t) 

mx- (3.2) 

dx 

is just the x-component of the 2 nd Newton law. 

It is convenient to discuss the dynamics of such really ID systems in the same breath with that of 
effectively ID systems whose position, due to holonomic constraints and/or conservation laws, is also 
fully determined by one generalized coordinate q, and whose Lagrangians may be presented in a form 
similar to Eq. (1): 



Effectively- 
1 D system 



L = T e{ (q)-U ef (q,t), T ef 



(3.3) 



where m e f is some constant which may be considered as the effective mass of the system, and the 
function U e f its effective potential energy. In this case the Lagrange equation (2.19) describing the 
system dynamics has a form similar to Eq. (2): 

m et q = z • (3-4) 

oq 

As an example, let us return again to our testbed system shown in Fig. 1.5. We have already seen 
that for that system, having one degree of freedom, the genuine kinetic energy T, expressed by the first 
of Eqs. (2.23), is not a quadratically-homogeneous function of the generalized velocity. However, the 
system's Lagrangian (2.23) still may be presented in form (3), 

L = ^-R 2 0 2 + ^-R 2 a) 2 sin 2 0 + mgR cos 0 + const = T ef -U ef , (3.5) 



if we take 



T ef =—R 2 0 2 , U e{ = R 2 cd 2 sin 2 0-mgR cos 6 + const. (3.6) 
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In this new partitioning of function L, which is legitimate because U e f depends only on the generalized 
coordinate 6, but not on the corresponding generalized velocity, r e f includes only a part of the full 
kinetic energy T of the bead, while t/ e f includes not only the real potential energy U of the bead in the 
gravity field, but also an additional term related to ring rotation. (As we will see in Sec. 6.6, this term 
may be interpreted as the effective potential energy due to the inertial centrifugal "force".) 

Returning to the general case of effectively ID systems with Lagrangian (3), let us calculate their 
Hamiltonian function, using its definition (2.32): 



u dL 

H = —q-L = m ef q 
oq 



■(T ef -U ef ) = T ef +U, 



ef 



(3.7) 



So, H is expressed via T e f and U e f exactly as the mechanical energy E is expressed via genuine T and U. 



3.2. Equilibrium and stability 

Autonomous systems are defined as the dynamic systems whose equations of motion do not 
depend on time. For ID (and effectively ID) systems obeying Eq. (4), this means that their function U e f, 
and hence the Lagrangian function (5) should not depend on time explicitly. According to Eqs. (2.35), in 
such systems the Hamiltonian function (7), i.e. the sum T ei + U e f, is an integral of motion. However, be 
careful! This may not be true for system's mechanical energy E; for example, as we already know from 
Sec. 2.2, for our testbed problem, with the generalized coordinate q = ^(Fig. 2A),H^E. 

According to Eq. (4), an autonomous system, at appropriate initial conditions, may stay in 
equilibrium at one or several stationary (alternatively called fixed) points q n , corresponding to either the 
minimum or a maximum of the effective potential energy (see Fig. 1): 




(3.8) 



Fixed-point 
condition 




q 2 q 



Fig. 3.1. Effective potential energy profile 
near stable (go, qi) and unstable (q\) fixed 
points, and its quadratic approximation (10) 
near point q 0 - schematically. 



In order to explore the stability of such fixed points, let us analyze the dynamics of small 
deviations 

q(t) = q{t)~q n (3.9) 
from the equilibrium. For that, let us expand function C/ e t{q) in the Taylor series at a fixed point, 



dU . „ 1 d U 

U ei (q) = U ef (q„) + -r L (q„)q+-— 2 
aq 2 aq 



(q n )q 2 +■ 



(3.10) 
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The first term in the right-hand part, U e ^q„), is arbitrary and does not affect motion. The next term, 
linear in deviation q , is equal zero - see the fixed point definition (8). Hence the fixed point stability is 
determined by the next term, quadratic in q , more exactly by its coefficient, 



d 2 U 



Hq n ) (3.H) 



ef dq 

which plays the role of the effective spring constant. Indeed, neglecting the higher terms of the Taylor 
expansion (10), 1 we see that Eq. (4) takes the familiar form - cf. Eq. (1.1): 

™ et q + tc et q = 0. (3.12) 

I am confident that the reader of these notes knows everything about this equation, but since we 
will soon run into similar but more complex equations, let us review the formal procedure of its 
solution. From the mathematical standpoint, Eq. (12) is an ordinary, linear differential equation of the 
second order, with constant coefficients. The theory of such equations tells us that its general solution 
(for any initial conditions) may be presented as 

q(t) = c+ e A+t +c_e Xj , (3.13) 

where constants c+ are determined by initial conditions, while the so-called characteristic exponents A+ 
are completely defined by the equation itself. In order to find the exponents, it is sufficient to plug just 
one partial solution, exp{/?i}, into the equation. In our simple case (12), this yields the following 

characteristic equation: 

m ef A 2 +/r ef =0. (3.14) 

If the ratio k e f/m e f is positive, 2 i.e. the fixed point corresponds to the minimum of potential energy 
(e.g., points qo and qi in Fig. 1), the characteristic equation yields 



A ± =±ia> 0 , co 0 = 



f V /2 

V^efV 



(3.15) 



(where i is the imaginary unity, i = -1), so that Eq. (13) describes sinusoidal oscillations of the system, 

~/,\ + i(O n t — ICQ n t , /~, i s\ 

q(t) = c + e u +c_e u = c c cos co 0 t + c s smco 0 t, (3.16) 

with eigenfrequency (or "own frequency") coq, about the fixed point which is thereby stable. On the 
other hand, at the potential energy maximum (k e f < 0, e.g., at point q\ in Fig. 1), we get 



,1/2 



X ± =±A, A = 



q(t) = c+ e +Xt +c_e~ At . (3.17) 



Since the solution has an exponentially growing part, 3 the fixed point is unstable. 



1 Those terms may be important only in the very special case then K ef is exactly zero, i.e. when a fixed point is an 
inflection point of function U e £q). 

2 In what follows, I will assume that the effective mass w e f is positive, which is true in most (but not all!) dynamic 
systems. The changes necessary if it is negative are obvious. 
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Note that the quadratic expansion of function U^q), given by Eq. (10), is equivalent to a linear 
expansion of the effective force: 



dq 



9=9, 



«-Kjq, (3.18) 



immediately resulting in the linear equation (12). Hence, in order to analyze the stability of a fixed point 
q„, it is sufficient to linearize the equation of motion in small deviations from that point, and study 
possible solutions of the resulting linear equation. 

As an example, let us return to our testbed problem (Fig. 2.1) whose function U e f we already 
know - see the second of Eqs. (6). With it, the equation of motion (4) becomes 

m R 2 0 = -^ L = mR 2 [co 2 cos0-Cl 2 ]sin0, i.e. 0 = [co 2 cos#-ft 2 ]sin#, (3.19) 
dO 

1/2 A 

where Q = (g/R) is the frequency of small oscillations of the system at co = 0 - see Eq. (2. 26). 4 From 
requirement (8), we see that on any 2^-long segment of angle 6*, 5 the system may have four fixed points: 

n 2 

6> 0 =0, 0 X = n, 0 23 = ±arccos^, (3.20) 

co 

The last two fixed points, corresponding to the bead rotating on either side of the ring, exist only if the 
angular velocity co of ring rotation exceeds Q. (In the limit of very fast rotation, co » Q, Eq. (20) yields 
02,3 — > +tt/2, i.e. the stationary positions approach the horizontal diameter of the ring - in accordance 
with physical intuition.) 

In order to analyze the fixed point stability, similarly to Eq. (9), we plug 0 = 9„+9 into Eq. 
(19) and Taylor-expand the trigonometric functions of 6 up to the first term in 0 : 

0 =[<y 2 (cos# 7I -sin#„ #)-Q 2 ](sin# 7i +cos#„ d). (3.21) 

Generally, this equation may be linearized further by purging its right-hand part of the term proportional 
to 9 2 ; however in this simple case, Eq. (21) is already convenient for analysis. In particular, for the 
fixed point 6b = 0 (corresponding to the bead position at the bottom of the ring), we have cos 6*o = 1 and 
sin6b = 0, so that Eq. (21) is reduced to a linear differential equation 

6 ={co 2 -Q 2 )§ , (3.22) 
whose characteristic equation is similar to Eq. (14) and yields 

A 2 =co 2 -Q 2 , for#«# 0 . (3.23a) 



3 Mathematically, the growing part vanishes at some special (exact) initial conditions which give c+ = 0. However, 
the futility of this argument for real physical systems should be obvious for anybody who had ever tried to 
balance a pencil on its sharp point. 

4 Note that Eq. (19) coincides with Eq. (2.25). This is a good sanity check illustrating that the procedure (5)-(6) of 
moving of a term from the potential to kinetic energy within the Lagrangian function is indeed legitimate. 

5 For this particular problem, the values of 0 that differ by a multiple of 2 k, are physically equivalent. 
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This result shows that if co < Q, when both roots X are imaginary, this fixed point is stable. However, 

2 2 1/2 

the roots become real, A+ = (co - Q ) , with one of them positive, so that the fixed point becomes 
unstable beyond this threshold, i.e. as soon as fixed points 02,3 exist. An absolutely similar calculations 
for other fixed points yield 

X 2 = Q 2 + co 2 > 0, for 0 * e x , (3.23b) 
A 2 =Q 2 -co 2 , for#«# 23 . (3.23c) 

These results show that fixed point 0\ (bead on the top of the ring) is always unstable - just as we could 
foresee, while the side fixed points 0 2 ,i are stable as soon as they exist (at co > Q). 

Thus, our fixed-point analysis may be summarized in a simple way: an increase of the ring 
rotation speed co beyond a certain threshold value, equal to Q (2.26), causes the bead to move on one of 
the ring sides, oscillating about one of the fixed points 0 2 ,3- Together with the rotation about the vertical 
axis, this motion yields quite a complex spatial trajectory as observed from a lab frame, so it is 
fascinating that we could analyze it qualitatively in such a simple way. 

Later in this course we will repeatedly use the linearization of the equations of motion for the 
analysis of stability of more complex systems, including those with energy dissipation. 



3.3. Hamiltonian ID systems 

The autonomous systems that are described by time-independent Lagrangians, are frequently 
called Hamiltonian, because their Hamiltonian function H (again, not necessarily equal to the genuine 
mechanical energy El) is conserved. In our current ID case, described by Eq. (3), 



iri 

H = ^q 2 +U ef (q) = const. (3.24) 
This is the first integral motion. Solving Eq. (24) for q , we get the first-order differential equation, 



dq 2 



1/2 



dt [m e{ 



[H-UAq)]\ , (3-25) 



which may be readily integrated: 



+ 



/ \l/2 q(t) j , 

'"A f dt-'-V (326) 

2 I ,Un-U,M)f 2 



Since constant H (as well as the proper sign before the integral - see below) is fixed by initial 
conditions, Eq. (26) gives the reciprocal form, t = t{q), of the desired law of system motion, q(t). Of 
course, for any particular problem the integral in Eq. (26) still has to be worked out, either analytically 
or numerically, but even the latter procedure is typically much easier than the numerical integration of 
the initial, second-order differential equation of motion, because at addition of many values (to which 
the numerical integration is reduced 6 ) the rounding errors are effectively averaged out. 



6 See, e.g., MA Eqs. (5.2) and (5.3). 
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Moreover, Eqs. (24)-(25) also allow a general classification of ID system motion. Indeed: 

(i) If H > U e f(q) in the whole range of interest, the effective kinetic energy T e f (3) is always 
positive. Hence derivative dqldt cannot change sign, so that the effective velocity retains the sign it had 
initially. This is the unbound motion in one direction (Fig. 2a). 

(ii) Now let the particle approach a classical turning point A where H = U e f(x) - see Fig. 2b. 7 
According to Eqs. (25), (26), at that point the particle velocity vanishes, while its acceleration, 
according to Eq. (4), is still finite. Evidently, this corresponds to the particle reflection from the 
"potential wall", with the change of velocity sign. 

(iii) If, after the reflection from point A, the particle runs into another classical turning point B 
(Fig. 2c), the reflection process is repeated again and again, so that the particle is bound to a periodic 
motion between two turning points. 




-1 0 1 2 Qjn 3 

Fig. 3.2. Graphical representation of Eq. (25) for three different cases: (a) unbound motion, with the 
velocity sign conserved, (b) reflection from the "classical turning point", accompanied with the velocity 
sign change, and (c) bound, periodic motion between two turning points - schematically, (d) Effective 
potential energy (6) of the bead on the rotating ring (Fig. 1.5) for co> D., in units oflmgR. 



The last case of periodic oscillations presents large practical interest, and the whole next chapter 
will be devoted to a detailed analysis of this phenomenon and numerous associated effects. Here I will 
only note that Eq. (26) immediately enables us to calculate the oscillation period: 



T = 2 




r dq 







(3.27) 



Oscillation 
period 



7 This terminology comes from quantum mechanics which shows that actually a particle (or rather its 
wavefunction) can, to a certain extent, penetrate the "classically forbidden range" where H < U e f(x). 
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where the additional upfront factor 2 accounts for two time intervals: for the motion from B to A and 
back (Fig. 2c). Indeed, according to Eq. (25), in each classically allowed point q the velocity magnitude 
is the same, so that these time intervals are equal to each other. 8 

Now let us link Eq. (27) to the fixed point analysis carried out in the previous section. As Fig. 2c 
shows, if H is reduced to approach U^m, the oscillations described by Eq. (27) take place at the very 
bottom of "potential well", about a stable fixed point q 0 . Hence, if the potential energy profile is smooth 
enough, we may limit the Taylor expansion (10) by the quadratic term. Plugging it into Eq. (27), and 
using the mirror symmetry of this particular problem about the fixed point q 0 , we get 

2) {[H-{U mm+ K ef q 2 l2f 2 » 0 J o(W 2 ) 



T = A 



where A = (2/ K e i) ll2 [H - U m i n ] 112 is the classical turning point, i.e. the oscillation amplitude, and cdo is the 
eigenfrequency given by Eq. (15). Taking into account that the elementary integral I in that equation 
equals 7i/2, 9 we finally get 

T = —, (3.29) 

as it should be for harmonic oscillations (16). Note that the oscillation period does not depend on the 
oscillation amplitude A, i.e. on the difference (H - U m m ) - while it is small. 



3.4. Planetary problems 

Leaving a more detailed study of oscillations for the next chapter, let us now discuss the so- 
called planetary systems™ whose description, somewhat surprisingly, may be also reduced to an 
effectively ID problem. Consider two particles that interact via a conservative, central force F 2 i = - F12 
= n r F(r), where r and n r are, respectively, the magnitude and direction of the distance vector r = r\ - r 2 
connecting the two particles (Fig. 3). 




Fig. 3.3. Vectors in the "planetary" problem. 



8 Note that the dependence of points A and B on the "energy" H is not necessarily continuous. For example, for 
our testbed problem, whose effective potential energy is plotted in Fig. 2d (for a particular value of a> > Q), a 
gradual increase of H leads to a sudden jump, at H = Hi, of point B to position B \ corresponding to a sudden 
switch from oscillations about one fixed point d% >3 to oscillations about two adjacent fixed points (before the 
beginning of a persistent rotation along the ring at H> H 2 ). 

9 Introducing a new variable £ as <^= sin £ we get = cos ^d^= (1 - <f ) 112 dC, ', so that the function under the 
integral is just d£. 

10 This name is very conditional, because this group of problems includes, for example, charged particle scattering 
(see Sec. 3.7 below). 
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Generally, two particles moving without constrains in 3D space, have 3 + 3 = 6 degrees of 
freedom that may described, e.g., by their Cartesian coordinates {x\, y\, z\, xz, yi, zj} However, for this 
particular form of interaction, the following series of tricks allows the number of essential degrees of 
freedom to be reduced to just one. 

First, the central, conservative force of particle interaction may be described by time- 
independent potential energy U (r) . Hence the Lagrangian of the system is 

L = T-U(r) = ^ + ^-U(r). (3.30) 

Let us perform the transfer from the initial six scalar coordinates of the particles to six generalized 
coordinates: three Cartesian components of the distance vector 

r = ri-r 2 , (3.31) 

and three components of vector 



m l r l + m 2 r 2 

R^—^ — , M=m l +m 2 , 

M 



(3.32) 



Center of 
mass 



which defines the position of the center of mass of the system. Solving the system of two linear 
equations (31) and (32) for the ri and r 2 , we get 



r.=R + ^r, r 2 =R-^r. (3.33) 
M M 

Plugging these relations into Eq. (30), we may reduce it to 

L = fR 2 + ^r 2 -U(r), (3.34) 

where m is the so-called reduced mass: 

m m l l l 

(3.35) 




Reduced 
mass 



Note that according to Eq. (35), the reduced mass is lower than that of the lightest component of the 
two-body system. If one of m^2 is much less that is counterpart (like it is in most star-planet or planet- 
satellite systems), then with a good precision m = min [mi, mi\ 

Since the Lagrangian function (34) depends only on R rather than R itself, according to our 
discussion in Sec. 2.4, the Cartesian components of R are cyclic coordinates, and the corresponding 
generalized momenta are conserved: 



BL 

Pj=^ = MR j = const, 7 = 1,2,3. (3.36) 
8R ■ 



Physically, this is just the conservation law for the full momentum P = MR of our system, due to 
absence of external forces. Actually, in the axiomatics used in Sec. 1.3 this law is postulated - see Eq. 
(1.10) - but now we may attribute momentum P to a certain geometric point, the center of mass R. In 
particular, since according to Eq. (36) the center moves with constant velocity in the inertial reference 
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frame used to write Eq. (30), we may create a new inertial frame with the origin at point R. In this new 
frame, R = 0, so that vector r (and hence scalar r) remain the same as in the old frame (because the 
frame transfer vector adds equally to ri and T2, and cancels in r = ri - r 2 ), and the Lagrangian (34) is 
now reduced to 

L = ^r 2 -U(r). (3.37) 

Thus our initial problem has been reduced to just three degrees of freedom - three scalar 
components of vector r. Moreover, Eq. (37) shows that dynamics of vector r of our initial, two-particle 
system is identical to that of the radius-vector of a single particle with the effective mass m, moving in 
the central potential field U (r). 



3.5. 2 nd Kepler law 

Two more degrees of freedom may be excluded from the planetary problem by noticing that 
according to Eq. (1.35), the angular momentum L = rxp of our effective particle is also conserved, both 
in magnitude and direction. Since the direction of L is, by its definition, perpendicular to both of r and v 
= p/m, this means that particle's motion is confined to a plane (whose orientation in space is determined 
by the initial directions of vectors r and v). Hence we can completely describe particle's position by just 
two coordinates in that plane, for example by distance r to the center, and the polar angle q> In these 
coordinates, Eq. (37) takes the form identical to Eq. (2.49): 

L = ^(r 2 +r 2 (p 2 )-U(r). (3.38) 

Moreover, the latter coordinate, polar angle q>, may be also eliminated by using the conservation of 
angular momentum's magnitude, in the form of Eq. (2.50): 11 

L z = mr 2 <p = const. (3.39) 

A direct corollary of this conservation is the so-called 2 nd Kepler law: 12 the radius-vector r 
sweeps equal areas A in equal times. Indeed, in the linear approximation in dA«A, the area differential 
dA equals to the area of a narrow right triangle with the base being the arc differential rdq>, and the 
height equal to r - see Fig. 4. As a result, according to Eq. (39), the time derivative of the area, 

dA = r{rdcp)l2 = \_ rl . = ^ (3 4Q) 

dt dt 2 2m 

remains constant. Integration of this equation over an arbitrary (not necessarily small!) time interval 
proves the 2 nd Kepler law. 



11 Here index z stands for the coordinate perpendicular to the motion plane. Since other components of the angular 
momentum are equal zero, the index is not really necessary, but I will still use it, just to make a clear distinction 
between the angular momentum L z and the Lagrangian function L. 

12 One of three laws deduced almost exactly 400 years ago by J. Kepler (1571 - 1630), from the extremely 
detailed astronomical data collected by T. Brahe (1546-1601). In turn, the set of three Kepler laws were the main 
basis for Isaac Newton's discovery of the gravity law. That's how physics marched on. . . 
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Now note that since dLI dt = 0 , the Hamiltonian function H is also conserved, and since, 
according to Eq. (38), the kinetic energy of the system is a quadratic-homogeneous function of the 
generalized velocities r and q> , H = E , so that the system energy E, 



E = n Lf 2 +-r 2 cp 2 +U(r), 
2 2 



(3.41) 



is also a first integral of motion. 13 But according to Eq. (39), the second term of Eq. (41) may be 
presented as 



m 2 . 2 L z 
2 2mr 



so that energy (41) may be expressed as that of a ID particle moving along axis r, 



E = Y 2 +U sf (r), 



in the following effective potential: 



U a (r) = U(r) + 



L 2 _ 



2mr 



(3.42) 



(3.43) 



Effective 
(3.44) potential 
energy 



So the planetary motion problem has been reduced to the dynamics of an effectively ID system. 14 

Now we may proceed just like we did in Sec. 3, with due respect for the very specific effective 
potential (44) which, in particular, diverges at r — > 0 - possibly besides the very special case of an 
exactly radial motion, L z = 0. In particular, we may solve Eq. (43) for drldt to get 



dt = 



( \ 1/2 



dr 



[E-U ef (r)] 



1/2 



(3.45) 



The integration of this relation allows us not only to get a direct relation between time t and distance r, 
similar to Eq. (26), 



13 One may claim that this fact should have been evident from the very beginning, because the effective particle 
of mass m moves in a potential field U(r) which conserves energy. 

14 Note that this reduction has been done in a way different from that used for our testbed problem 
(shown in Fig. 2.1) in Sec. 2 above. (The reader is encouraged to analyze this difference.) In order to 
emphasize this fact, I will keep writing E instead of H here, though for the planetary problem we are 
discussing now these two notions coincide. 
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t = ± 



f V /2 
m 1 



dr 



[E-U ef (r)] 



1/2 



= + 



r \ U2 
m i 



dr 



[E-U(r)-L z /2mr z ] 



2-11/2 ' 



(3.46) 



but also do a similar calculation of angle <p . Indeed, integrating Eq. (39), 



(p = ^<pdt = 



L z r dt 
m J r 1 



and plugging dt from Eq. (45), we get an explicit expression for particle's trajectory q> (r): 



<P 



L. 



dr 



{2m) 1 ' 2 J r 2 [E-U ef (r)] 1/2 {imj 12 J r 2 [E - U(r) - h\ /2mr 2 ] 



dr 



2 -11/2 



(3.47) 



(3.48) 



Note that according to Eq. (39), derivative dcpldt does not change sign at the reflection from any 
classical turning point r ^ 0, so that, in contrast to Eq. (46), the sign in the right-hand part of Eq. (48) is 
uniquely determined by the initial conditions and cannot change during the motion. 

Let us use these results, valid for any interaction law U(r), for the planetary motion's 
classification. The following cases should be distinguished. (Following a good tradition, in what follows 
I will select the arbitrary constant in the potential energy in the way to provide U e f — > 0 at r — > 00.) 

If the particle interaction is attractive, and the divergence of the attractive potential at r — > 0 is 
faster than 1/r 2 , then U e 6f) — > -00 at r — > 0, so that at appropriate initial conditions (E < 0) the particle 
may drop on the center even if L z ^ 0 - the event called the capture. On the other hand, with U(r) either 
converging or diverging slower than 1/r 2 at r — > 0, the effective energy profile U e ^r) has the shape 
shown schematically in Fig. 5. This is true, in particular, for the very important case 



Attractive 
Coulomb 
potential 



U(r) = -—, a>0, 
r 



(3.49) 



which describes, in particular, the Coulomb (electrostatic) interaction of two particles with electric 
charges of the opposite sign, and Newton's gravity law. This case will be analyzed in the following 
section, and now let us return to the analysis of an arbitrary attractive potential U(r) < 0 leading to the 
effective potential shown in Fig. 5, when the angular-momentum term dominates at small distances r. 




Fig. 3.5. Effective potential profile of, and two 
types of motion in an attractive central field. 



According to the analysis of Sec. 3, such potential profile, with a minimum at some distance ro, 
may sustain two types of motion, depending on the energy E (which is of course determined by the 
initial conditions): 



Chapter 3 



Page 11 of 18 



Essential Graduate Physics 



CM: Classical Mechanics 



(i) If E > 0, there is only one classical turning point where E = U e f, so that distance r either grows 
with time from the very beginning, or (if the initial value of r was negative) first decreases and then, 
after the reflection from the increasing potential C/ e f, starts to grow indefinitely. The latter case, of 
course, describes scattering. 

(ii) On the opposite, if the energy is within the range 

U et (r 0 )<E<0, (3.50) 

the system moves periodically between two classical turning points r m [ a and r max . These oscillations of 
distance r correspond to the bound orbital motion of our effective particle about the attracting center. 15 

Let us start with the discussion of the bound motion, with energy within the range (50). If energy 
has its minimal possible value, 

E = U et (r 0 ) = mm[U et (r)], (3.51) 

the distance cannot change, r = ro = const, so that the orbit is circular, with the radius ro satisfying the 
condition dUJdr = 0. Let us see whether this result allows for an elementary explanation. Using Eq. 
(44) we see that the condition for r 0 may be written as 



Ll dU 



mr 0 dr 



(3.52) 



Since in a circular motion, velocity v is perpendicular to the radius vector r, L z is just mr 0 v, the left-hand 
part of Eq. (52) equals mv 2 /ro, while its right-hand part is just the magnitude of the attractive force, so 
that this equation expresses the well-known 2 nd Newton law for the circular motion. Plugging this result 
into Eq. (47), we get a linear law of angle change, (p = cot + const, with angular velocity 

co = \ = -, (3.53) 

mr 0 r 0 

and hence the rotation period T 9 = Inlco obeys the elementary relation 

r.=m. (3.54) 

V 

Now, let the energy be above its minimum value. Using Eq. (46) just as in Sec. 3, we see that 
distance r now oscillates with period 

T r = {2mf 2 d - 2 (3.55) 

/ [E-U(r)-L 2 z /2mr 2 ] U2 

'min 

This period is, in general, different from 7^. Indeed, the change of angle q> between two sequential 
points of the nearest approach, that follows from Eq. (48), 



15 In the opposite case when the interaction is repulsive, U(r) > 0, the addition of the positive angular energy term 
only increases the trend, and only the scattering scenario is possible. 
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L "F dr 

M = 2 ^|^w^7wF' (3 ' 56> 

is generally different from 2n . Hence, the general trajectory of the bound motion has a spiral shape - 
see, e.g., an illustration in Fig. 6. 




Fig. 3.6. Fypical open orbit of a particle 
moving in a non-Coulomb central field. 



3.6. 1 st and 3 rd Kepler laws 

The situation is special, however, for a very important particular case, namely that of the 
Coulomb potential described by Eq. (49). Indeed, plugging this potential into Eq. (48), we get 



<p = ±- 



L, 



(2mf 2 I r 2 [E + a/r-L 2 z /2mr 2 ] 



dr 



1/2 



(3.57) 



This is a table integral, 16 equal to 



q> = ±arccos- 



L, I mar - \ 

J/2 



(l + 2EL]/ma 2 ) 
The reciprocal function, r(qj), is 2n- periodic: 



+ const. 



(3.58) 



Elliptic 
orbit 



r = 



1 + e cos(#> + const) 



(3.59) 



and its so that at E < 0, the orbit a closed line, 17 characterized with the following parameters: 
parameters 




(3.60) 



The physical meaning of these parameters is very simple. Indeed, according to the general Eq. 
(52), in the Coulomb potential, for which dUldr = air 1 , we see that p is just the circular orbit radius 18 for 
given Z-: ro = L z 2 /ma = p, and 



16 See, e.g., MAEq. (6.3). 

17 It may be proved that for the power-law interaction, U <x r v , the orbits are closed line only if v= -1 (i.e. our 
current case of the Coulomb potential) or v= +2 (the 3D harmonic oscillator) - the so-called Bertrand theorem. 
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min[[/ ef (r)] = £/ ef (r 0 ) = 



a 2 m 



(3.61) 



Using this equality, parameter e (called eccentricity) may be presented just as 



1/2 



e= 1- 



min[t/ rf (r)] 



(3.62) 



Analytical geometry tells us that Eq. (59), with e < 1, is one of canonical forms for presentation 
of an ellipse, with one of its two focuses located at the origin. This fact is known as the 1 st Kepler law. 
Figure 7 shows the relation between the main dimensions of the ellipse and parameters p and e. 19 



center 



aphelion 



y = r sin q> 



focus (one of the two) 
perihelion 




x = rcos(p 



Fig. 3.7. Ellipse, and its special 
points and dimensions. 



In particular, the major axis a and minor axis b are simply related to p and e and hence, via Eqs. 
(60), to the motion integrals E and L z \ 



a 



a = 



1 



b = 



\l/2 



(3.63) 



2\E\' (l- e >f 2 ( 2m \E\f 

As was mentioned above, at E — » min [C/ e f(r)] the orbit is almost circular, with r{q>) = r 0 = p. On 
the contrary, as E is increased to approach zero (its maximum value for the closed orbit), then e — > 1, so 
that the aphelion point r max =p/(l - e) tends to infinity, i.e. the orbit becomes extremely extended. If the 
energy is exactly zero, Eq. (59) (with e = 1) is still valid for all values of q> (except for one special point 

<p = n where r becomes infinite) and describes a parabolic (i.e. open) trajectory. At E > 0, Eq. (59) is 
still valid within a certain sector of angles q> (in that it yields positive results for r), and describes an 
open, hyperbolic trajectory - see the next section. 

For E < 0, the above relations also allow a ready calculation of the rotation period 7= /p= . 
(In the case of a closed trajectory, /Tand 7^ have to coincide.) Indeed, it is well known that the ellipse 
area A = nab. But according to the 2 nd Kepler law (40), dAldt = L z /2m = const. Hence 

A nab 



r = 



dAldt LI 2m 



(3.64a) 



18 Mathematicians prefer a more solemn terminology: parameter 2p is called the latus rectum of the elliptic 
trajectory - see Fig. 7. 

19 In this figure, the constant participating in Eqs. (58)-(59) is assumed to be zero. It is evident that a different 
choice of the constant corresponds just to a constant turn of the ellipse about the origin. 
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Using Eqs. (60) and (63), this result may be presented in several other forms: 



r = 



np 



(\-e 2 ) 3/2 (L z /2m) 



= na 



r y /2 

m 



= 2na 



3/2 



772 



(3.64b) 



Since for the Newtonian gravity a = Gm\m.2 = GmM, at m\ « m.2 (i.e. m « M) this constant is 
proportional to m, and the last form of Eq. (64b) yields the 3 rd Kepler law: periods of motion of different 
planets in the same central field, say that of our Sun, scale as T <x a 12 . Note that in contrast to the 2 nd 
Kepler law (that is valid for any central field), the 1 st and 3 rd Kepler laws are potential-specific. 



3.7. Classical theory of elastic scattering 

If E > 0, the motion is unbound for any interaction potential. In this case, the two most important 
parameters of the particle trajectory are the scattering angle 0 and impact parameter b (Fig. 8), and the 
main task for theory is to find the relation between them in the given potential U{r). For that, it is 
convenient to note that b is related to two conserved quantities, particle's energy 20 E and its angular 
momentum L z , in a simple way: 21 

L z =b(2mEf 2 . (3.65) 
Hence the angular contribution to the effective potential (44) may be presented as 




(3.66) 



Second, according to Eq. (48), the trajectory sections from infinity to the nearest approach point (r = 
r m i n ), and from that point to infinity, have to be similar, and hence correspond to equal angle changes qxs 
- see Fig. 8. 



min 




Fig. 3.8. Main geometric parameters of the scattering problem. 



Hence we may apply the general Eq. (48) to just one of the sections, say [r m i n , qo], to find the 
scattering angle: 



2U The energy conservation law is frequently emphasized by calling this process elastic scattering. 

21 Indeed, at r » b, the definition L = rx(mv) yields L z = bmv m , where Voo = (2E/m) 1 ' 2 is the initial (and hence the 

final) velocity of the particle. 
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9 = n - 2<p 0 =7i-2 



dr 



(2m) V2 r { n r 2 [E-U(r)-L 2 z /2m r 2 ]' 2 



= n - 



2 1 



bdr 



r 2 [\-U(r)/E-b 2 lr 2 } 12 



.(3.67) 



In particular, for the Coulomb potential (49), now with an arbitrary sign of a, we can apply the same 
table integral as in the previous section to get 22 



9 = 



71-2 arccos 



al2Eb 



\[ + (a/2Eb) 2 ] 



12 



(3.68a) 



This result may be more conveniently rewritten as 



\a\ 



\9\ 

tan— = ' ' . 
2 2Eb 



(3.68b) 



Very clearly, the scattering angle's magnitude increases with the potential strength a, and decreases as 
either the particle energy or the impact parameter (or both) are increased. 

The general equation (67) and the Coulomb-specific relations (68) present a formally complete 
solution of the scattering problem. However, in a typical experiment on elementary particle scattering 
the impact parameter b of a single particle is random and unknown. In this case, our results may be used 
to obtain statistics of the scattering angle 9, in particular the so-called differential cross-section 23 

(3.69) 

where n is the average number of the incident particles per unit area, and dN is the average number of 
particles scattered into a small solid angle range dQ,. For a spherically-symmetric scattering center, 
which provides an axially-symmetric scattering pattern, da/dQ, may be calculated by counting the 
number of incident particles within a small range db of the impact parameter: 




Differential 
cross- 
section 



dN = n2nbdb. 

and hence scattered into the corresponding small solid angle range dQ, 
relations into Eq. (69), we get the following general geometric relation: 



da 
dQ 



sin 9 



db 



d9 



(3.70) 

2tv sin 6* d9. Plugging these 
(3.71) 



In particular, for the Coulomb potential (49), a straightforward differentiation of Eq. (68) yields 
the so-called Rutherford scattering formula 



da 




2 1 


dQ ~ 


K AEj 


sin 4 (0/2)' 



(3.72) 



Rutherford 

scattering 

formula 



22 Alternatively, this result may be recovered directly from Eq. (59) whose parameters, at E >0, may be expressed 
via the same dimensionless parameter (2Eb/a): p = b(2Eb/a), e = [1 + (2Eb/a) 2 ] 112 > 1. 

23 This terminology stems from the fact that an integral of daldQ. over the full solid angle, called the full cross- 
section a, has the dimension of area: cr= N/n, where N is the total number of scattered particles. 
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This result, which shows very strong scattering to small angles (so strong that the integral that 
expresses the full cross-section <j is formally diverging at 6*— > 0), 24 and weak backscattering (scattering 
to angles 6 « 7t) was historically extremely significant: in the early 1910s its good agreement with a- 
particle scattering experiments carried out by E. Rutherford's group gave a strong justification for 
"planetary" models of atoms, with electrons moving about very small nuclei. 

Note that elementary particle scattering is frequently accompanied with electromagnetic 
radiation and/or other processes leading to the loss of the initial mechanical energy of the system, 
leading to inelastic scattering, that may give significantly different results. (In particular, a capture of an 
incoming particle becomes possible even for a Coulomb attracting center.) Also, quantum-mechanical 
effects may be important at scattering, so that the above results should be used with caution. 

3.8. Exercise problems 

3.1 . Use Eq. (27) to calculate the functional dependence of period f of oscillations of a ID 

particle of mass m in potential U{q) = aq n (where a > 0, and n is a positive integer) on energy E. 
Explore the limit n — > oo. 



3.2 . Explain why the term mr 2 <p 2 1 2 , recast in accordance with Eq. (42), cannot be merged with 

2 2 

U(r) in Eq. (38), to form an effective ID potential energy U(r) - L z /2mr , with the second term's sign 
opposite to that given by Eq. (44). We have done an apparently similar thing for our testbed, bead-on- 
rotating-ring problem in the very end of Sec. 1 - see Eq. (3.6); why cannot the same trick work for the 
planetary problem? 



3.3 . For motion in the central potential 

U{r) = + 

r r 

(i) find the orbit r{q>), for positive a and /?, and all possible ranges of energy E; 

(ii) prove that in the limit /?— » 0, and for energy E < 0, the orbit may be represented as a slowly 
rotating ellipse; 

(iii) express the angular velocity of this slow orbit rotation via parameters a and /? of the 
potential, particle's mass m, its energy E, and the angular momentum L z . 

3.4 . A particle is launched from afar, with impact parameter b, toward an attracting center with 
central potential 

cc 

U(r) = — -, with n > 2, a > 0. 



24 This divergence, which persists at the quantum-mechanical treatment of the problem, is due to particles with 
large values of b, and disappears at an account, for example, of a finite concentration of the scattering centers. 
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The initial kinetic energy E of the particle is barely sufficient for escaping the capture by the attracting 
center. Express the minimum distance between the particle and the center via b. 

3.5 . For the same attractive potential as in Problem 4, with n > 2, a > 0, find capture's full cross- 
section. 
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Chapter 4. Oscillations 

In this course, oscillations in ID {and effectively ID) systems are discussed in detail, because of their 
key importance for physics and engineering. We will start with the so-called "linear" oscillator whose 
differential equation of motion is linear and hence allows the full analytical solutions, and then proceed 
to "nonlinear" and parametric systems whose dynamics may be only explored by either approximate 
analytical or numerical methods. 



4.1. Free and forced oscillations 

In Sec. 3.2 we briefly discussed oscillations in a very important Hamiltonian system - a ID 
harmonic oscillator described by a simple ID Lagrangian 1 



L = T(q)-U(q) = -?--q- 



whose Lagrangian equation of motion, 



Harmonic 
oscillator's 
equation 



Harmonic 
oscillator's 
motion 



mq + /cq = 0, i.e.q + a>lq = 0, with co\ = — > 0 . 

m 



(4.1) 



(4.2) 



is a linear homogeneous differential equation. Its general solution is presented by Eq. (3.16), but it is 
frequently useful to recast it into another, amplitude-phase form: 



qif) = u cos a> 0 t + v sin a> 0 t = A cos(a> 0 t - (p) , 



(4.3a) 



where A is the amplitude and (p the phase of the oscillations, which are determined by the initial 
conditions. Mathematically, it is frequently easier to work with sinusoidal functions as complex 
exponents, by rewriting Eq. (3a) in one more form: 2 





qif) = Re 


Ae ~ K&o* - <P) 


= Re 


- icoJ. 

ae 




Real 













complex wnere a is the complex amplitude of the oscillations: 
amplitudes 

= AJ<P 



a = Ae v , \a = A, Rea = ^cos$> = w, \ma = Asmcp = v. 



(4.3b) 



(4.4) 



Equations (3) represent the so-called free oscillations of the system, that are physically due to 
the initial energy of the system. At an account for dissipation, i.e. energy leakage out of the system, such 
oscillations decay with time. The simplest model of this effect is represented by an additional viscosity 
force that is proportional to the generalized velocity and directed opposite to it: 



1 For the notation simplicity, in this chapter I will drop indices "ef in the energy components T and U, and 
parameters like m, k, etc. However, the reader should still remember that T and U do not necessarily coincide with 
the real kinetic and potential energies (even if those energies may be uniquely identified) - see Sec. 3.1. 

2 Note that this is the so-called physics convention. Most engineering texts use the opposite sign in the imaginary 
exponent, exp{-ica} — > exp{ia>t}, with the corresponding sign implications for intermediate formulas, but (of 
course) similar final results for real variables. 
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F v = -tjq 



(4.5) 



where constant rj is called the viscosity coefficient? The inclusion of this force modifies the equation of 
motion (2) to become 



ma + rjq + Kq = 0 . 
This equation is freque ntly presented in the form 



2 V 

q + 28q + co 0 q = 0, with 8 = - 1 — , 

2m 



(4.6a) 



(4.6b) 



Free 

oscillator 
with 

damping 



where parameter 8 is called the damping coefficient. Note that Eq. (6) is still a linear homogeneous 
second-order differential equation, and its general solution still has the form of the sum (3.13) of two 
exponents of the type exp {/!/}, with arbitrary pre-exponential coefficients. Plugging such an exponent 
into Eq. (4), we get the following algebraic characteristic equation for A: 



1/2 



r + 281 + m; = o. 

Solving this quadratic equation, we get 

Z ± =-8± io) 0 ', where co 0 ' = {col ~ ^ J 

so that for not very high damping (8< coo) 4 we get the following generalization of Eq. (3): 

AjJ . X_t i ,. . .a -St . -St 



?ftao(0 



c ,e +l +c_e /l 1 = (u 0 cosco 0 't + v 0 sinoo 0 't)e = A 0 e cos(co Q 't -<p 0 ). 



(4.7) 



(4.8) 



(4.9) 



The result shows that, besides a certain correction to the free oscillation frequency (which is very small 
in the most interesting case of low damping, 8 « coq), the energy dissipation leads to an exponential 
decay of oscillation amplitude with time constant r= \I8. 



A = A 0 e 



-tlz 



, 1 2m 

where r = — = . 

8 rj 



(4.10) 



Decaying 
free 

oscillations 



A convenient, dimensionless measure of damping is the so-called quality factor Q (or just Q- 
factor ) which is defined as a>o/28, and may be rewritten in several other useful forms: 



o _a 0 _ mo) 0 
28 rj 



I \l/2 
_ XpiK) T _ CO Q T 

~ r/ ~ U Y~^2~' 



(4.11) 



3 Here I treat Eq. (5) as a phenomenological model, but in statistical mechanics such dissipative term may be 
derived as an average force exerted on a body by its environment whose numerous degrees of freedom are in 
random, though possibly thermodynamically-equilibrium states. Since such environmental force also has a 
random component, the dissipation is fundamentally related to fluctuations, and the latter effects may be 
neglected (as they are in this course) only if the oscillation energy is much higher than the energy scale of random 
fluctuations of the environment - in the thermal equilibrium at temperature T, the larger of k B T and haxjl - see, 
e.g., SM Chapter 5 and QM Chapter 7. 

4 Systems with very high damping (S > a>o) can hardly be called oscillators, and though they are used in 
engineering and physics experiment (e.g., for the shock, vibration, and sound isolation), for their discussion I have 
to refer the interested reader to special literature - see, e.g., C. Harris and A. Piersol, Shock and Vibration 
Handbook, 5 th ed., McGraw Hill, 2002. 
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where T= Inlcoo is the oscillation period in the absence of damping - see Eq. (3.29). Since the 

oscillation energy E is proportional to their amplitude squared, i.e. decays as exp{-2?/r}, with time 
constant r/2, the last form of Eq. (1 1) may be used to rewrite the g-factor in one more form: 



Forced 
oscillator 
with 
damping 



(4.12) 



where V is the dissipation power. (Two other useful ways to measure Q will be discussed in a minute.) 
The range of g-factors of important oscillators is very broad, all the way from Q ~ 10 for a human leg 
(with relaxed muscles), to Q ~ 10 4 of the quartz crystals used in "electronic" clocks and watches, all the 
way up to Q ~ 10 12 for microwave cavities with superconducting walls. 

In contrast to the decaying free oscillations, the forced oscillations, induced by an external force 
F(t), may maintain their amplitude infinitely, even at nonvanishing damping. This process may be 
described by a still linear but now inhomogeneous differential equation 

(4.13a) 



mq +rjq + Kq = F(t), 



or, more conveniently, by the following generalization of Eq. (6b): 



q + 2Sq + a>lq = f{t), where f{t) = F{t)lm. 



(4.13b) 



For a particle of mass m, confined to a straight line, Eq. (12a) is just an expression of the 2 nd Newton 
law (or rather one of its Cartesian component). More generally, according to Eq. (1.41), Eq. (13) is valid 
for any dissipative ID system whose Gibbs potential energy (1.39) has the form Udq, t) = icq 12- F(f)q. 

The forced-oscillation solutions may be analyzed by two mathematically equivalent methods 
whose relative convenience depends on the character of function/^). 

(i) Frequency domain. Let us present function^) as a Fourier sum of sinusoidal harmonics: 5 



imt 



(4.14) 



Then, due to linearity of Eq. (13), its general solution may be presented as a sum of the decaying free 
oscillations (9) with frequency «o ', independent of function F(t), and forced oscillations due to each of 
the Fourier components of the force: 6 



General 
solution 
of Eq. (13) 



?(0 = ?fee(0 + forced (0. Q 



i forced 



a.e 



- icot 



(4.15) 



Plugging Eq. (15) into Eq. (13), and requiring the factors before each e im in both parts to be equal, we 
get 

«.=/.*(<»), (4-16) 



where complex function ;K<y), in our particular case equal to 



5 Operator Re, used in Eq. (3), may be dropped here, because for any physical (real) force, the imaginary 
components of the sum compensate each other. This imposes the following condition on the complex Fourier 
amplitudes: f. m =f*, where the star means the complex conjugation. 

6 In physics, this mathematical property of linear equations is frequently called the linear superposition principle. 
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[q)q -co 2 )-2icoS 



(4.17) 



is called either the response function or (especially for non-mechanical oscillators) the generalized 
susceptibility. From here, the real amplitude of oscillations under the effect of a sinusoidal force that 
may be represented by just one Fourier harmonic of the sum (15), is 

Forced 
(4 18) oscillation's 



K = K\ = \fjZ(a>% with \x(g>)\ = ■ 



1 



(co 2 -co 2 ) 2 +(2coS) 2 ]' 2 



amplitude 



This formula describes, in particular, an increase of the oscillation amplitude A m at co 
Fig. 1. According to Eqs. (11) and (20), at the exact resonance, 



i 



2co Q S 



coq - see 



(4.19) 



so that, according to Eq. (11), the ratio of the oscillator response magnitudes at co = coo and at co = 0 
(\zi 6) )\co=o = is exactly equal to the g-factor. Thus, the response increase is especially strong in the 

low damping limit (S« m, i.e. Q » 1); moreover at Q — > oo and — > a>o the response diverges. (This 
fact is very useful for the approximate methods to be discussed later in this chapter.) This is of course 
the classical description of the famous phenomenon of resonance, so ubiquitous in physics. 




Due to the increase of the resonance peak height, its width is inversely proportional to Q. 
Quantitatively, in the most interesting low-damping limit, Q » 1, the reciprocal £)- factor gives the 
normalized value of the so-called FWHM ("full-width at half-maximum") of the resonance curve: 

^4- (4.20) 

Indeed, Aco is defined as the difference (co+ - co) between the two values of co at that the square of 
oscillator response function, | %(co) \ 2 (which is, in particular, proportional to the oscillation energy), 
equals a half of its resonance value (19). In the low damping limit, both these points are very close to 
coo, so that in the first (linear) approximation in (co- coo) « coo, co we can take (coq 2 - co 2 ) = -(co+coq){co- 
coo) ~ (-2co<%) « (-2 where 
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^ = co-co 0 (4.21) 

is a convenient parameter called detuning. (We will repeatedly use it later in this chapter.) In this 
approximation, the second of Eqs. (18) is reduced to 



W <4 ' 22> 



2 2 

As a result, points ax correspond to % = S , i.e. co± = coo ± S = ft>o(l ± 1/20, so that Aa> = co+ - co. = 
a>o/Q, thus proving Eq. (20). 

(ii) Time domain. Returning to the general problem of linear oscillations, one may argue that 
Eqs. (9), (15)-(17) provide a full solution of the forced oscillation problem. This is formally correct, but 
this solution may be very inconvenient if the external force is far from sinusoidal function of time. In 
this case, we should first calculate the complex amplitudes fa, participating in the Fourier sum (14). In 
the general case of non-periodic j{t), this is actually the Fourier integral, 

f(t)=\f m e- im dt, (4.23) 

-co 

so that/^ should be calculated using the reciprocal Fourier transform, 

* +00 

L= — \f(t')e im 'dt'. (4.24) 

-co 

Now we can use Eq. (16) for each Fourier component of the resulting forced oscillations, and rewrite the 
last of Eqs. (15) as 

+00 +00 +00 +00 

U0= \a m e- iat da>= \ X ifo)f m e- iat da= \da> % {a>)—\df f{t')e i<0 « ~ f) 

(4.25) 

+0U -t -l-OU 

-00 _ -00 

with the response function %(a>) given, in our case, by Eq. (17). Besides requiring two integrations, Eq. 
(25) is conceptually uncomforting: it seems to indicate that the oscillator's coordinate at time t depends 
not only on the external force exerted at earlier times t' < t, but also in future times. This would 
contradict one of the most fundamental principles of physics (and indeed, science as a whole), the 
causality: no effect may precede its cause. 

Fortunately, a straightforward calculation (left for reader's exercise) shows that the response 
function (17) satisfies the following rule: 7 



-i-ou 

\x(o))e~ iaT do)=0, forr<0. (4.26) 



7 This is true for all systems in which f{i) represents a cause, and q(t) its effect. Following tradition, I discuss the 
frequency-domain expression of this causality relation (called the Kramers -Kronig relations) in the Classical 
Electrodynamics part of this lecture series - see EM Sec. 7.3. 
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This fact allows the last form of Eq. (25) to be rewritten in either of the following equivalent forms: 



forced (0 = \f{f)G{t-f)df = \f(t-T)G(T)dT ; 



(4.27) 



where G(r), defined as the Fourier transform of the response function, 



G(t) = — \x(co)e- iaT dco, 
in J 



(4.28) 



Linear 

system's 

response 



Temporal 

Green's 

function 



is called the (temporal) Green 's function of the system. According to Eq. (26), G(r) = 0 for all r < 0. 

While the second form of Eq. (27) is more convenient for calculations, its first form is more clear 
conceptually. Namely, it expresses the linear superposition principle in time domain, and may be 
interpreted as follows: the full effect of forced) on an oscillator (actually, any linear system?) may be 
described as a sum of effects of short pulses of duration dt' and magnitude ft): 



? forc e d (0 = lim A ^ 0 ^G(t-t')f(t')At'. 



(4.29) 



t'=-CC 



- see Fig. 2. The Green's function G(t) thus describes the oscillator response to a unit pulse of force, 
measured at time r = t - t ' after the pulse. 



fit') 



0 



dt' 



Fig. 4.2. Presentation of the force as a 
^, function of time as a sum of short pulses. 



Mathematically, it is more convenient to go to the limit dt' — > 0 and describe the elementary, 
unit-area pulse by Dirac's (^-function, 9 thus returning to Eq. (27). This line of reasoning also gives a 
convenient way to calculate the Green's function. Indeed, for the particular case, 



f(t) = S(t-t 0 ), withf 0 <f, 



(4.30) 



Eq. (27) yields q{t) = G{t - t 0 ). In particular, if t > 0, we may take to = 0; then q{t) = G(t). Hence the 
Green's function may be calculated as a solution of the differential equation of motion of the system, in 
our case, Eq. (13), with the ^-functional right-hand part: 



d 2 G(r) 
dr 2 



+ 25 



dG(r) 
dr 



+ co 2 0 G(t) = S(t), 



(4.31) 



and zero initial conditions: 



8 This is a very unfortunate, but common jargon, meaning "the system described by linear equations of motion". 

9 For a reminder of the basic properties of the (5-function, see MA Sec. 14. 
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G(-0) = ^(-0) = 0, 
ax 



0. 



(4.32) 



where t = -0 means the instant immediately preceding t : 

This calculation may be simplified even further. Let us integrate both sides of Eq. (31) over a 
infinitesimal interval including the origin, e.g. [-dr/2, +dr/2], and then follow the limit dr — > 0. Since 
Green's function has to be continuous because of its physical sense as the (generalized) coordinate, all 
terms in the left hand part but the first one vanish, while the first term yields dG/dz\ +n - dG/dz\ .n. Due 
to the second of Eqs. (32), the last of these two terms equals zero, while the right-hand part yields 1. 
Thus, G(r) may be calculated for r > 0 (i.e. for all times when the Green's function is different from 
zero) by solving the homogeneous version of system's equation of motion for r > 0, with the following 
special initial conditions: 



Oscillator's 
Green's 
function 



G(0) = 0, 



dG 
dr 



(0) = 1. 



(4.33) 



This approach gives us a convenient way for calculation of Green's functions of linear systems. 
In particular for the oscillator with not very low damping {8 > a>o, i.e. Q > Vi), imposing boundary 
conditions (33) on the general free-oscillation solution (9), we immediately get 10 



G{T) = — t e 



St 



(4.34) 



Equations (27) and (34) provide a very convenient recipe for solving most forced oscillations 
problems. As a very simple example, let us calculate the transient process in an oscillator under the 
effect of a constant force being turned on at t = 0 : 



fit) 



[ 0, t < 0, 
\f 0 , t>0, 



(4.35) 



provided that at t < 0 the oscillator was at rest, so that <7f ree (0 = 0. Then the second form of Eq. (27) 
yields 



r 1 



q(t) = [ f(t -r)G(T)dT = fA — e ^ T sin co 0 't dr 



o a o 



(4.36) 



The simplest way to work out such integrals is to present the sine function as the imaginary part of 
expjz'oo 7}, and merge the two exponents, getting 



q(t) = f a —i™ 



1 



St - iw 0 'T 



S + ico 0 ' 



F n 



l-e 



St 



, 8 . , 

cos co 0 t-\ sin co 0 t 



(4.37) 



This result, plotted in Fig. 3, is rather natural: it describes nothing more than the transient from 
the initial equilibrium position q = 0 to the new equilibrium position q 0 =fo/a>o = FqIk, accompanied by 
decaying oscillations. For this particular simple function^), the same result might be also obtained by 



10 The same result may be obtained from Eq. (28) with the response function %(a)) given by Eq. (19). This, more 
cumbersome, way is left for reader's exercise. 
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introducing a new variable q{t) = q(t) - qo and solving the resulting homogeneous equation for q (with 
appropriate initial condition q (0) = - go), but for more complicated functions f{t) the Green's function 
approach is irreplaceable. 




Fig. 4.3. Transient process in a linear 
oscillator, induced by a step-like force J{t), for 
the particular case SIcoq = 0.1 (i.e., Q = 5). 



Note that for any particular linear system, its Green's function should be calculated only once, 
and then may be repeatedly used in Eq. (27) to calculate the system response to various external forces - 
either analytically or numerically. This property makes the Green's function approach very popular in 
many other fields of physics - with the corresponding generalization or re-definition of the function. 11 



4.2. Weakly nonlinear oscillations 

In comparison with systems discussed in the last section, which are described by linear 
differential equations with constant coefficients and thus allow a complete and exact analytical solution, 
oscillations in nonlinear systems generally present a complex and, generally, analytically intractable 
problem. Let us start a discussion of such nonlinear oscillations 11 from an important case that may be 
explored analytically. In many important ID oscillators, higher terms in the potential expansion (3.10) 
cannot be neglected, but are small and may be accounted for approximately. If, in addition, damping is 
low (or negligible), the equation of motion may be presented as a slightly modified Eq. (13): 



q + (0 2 q=f(t,q,q,...) t 



(4.38) 



Weakly 

nonlinear 

oscillator 



where a> « a>o is the anticipated frequency of oscillations (whose choice is to a certain extent arbitrary - 
see below), and the right-hand part/ is small (say, scales as some small dimensionless parameter s« 
1), and may be considered as a perturbation. 

Since at s = 0 this equation has the sinusoidal solution given by Eq. (3), one might naively think 
that at nonvanishing but small s, the approximate solution to Eq. (38) should be sought in the form 



q(t) = q (0) +q (l) +q (2) +..., where q w ace* 



(4.39) 



with </ 0) = A cos (a>ot - (p) <x £ u . This is a good example of an apparently impeccable mathematical 
reasoning that would lead to a very inefficient procedure. Indeed, let us apply it to the problem we 
already know the exact solution for, namely the free oscillations in a linear but damped oscillator, for 



Formal 

perturbative 

solution 



11 See, e.g., EM Sec. 2.7, and QM Sec. 2.2. 

12 Again, "nonlinear oscillations" is a generally accepted slang term for oscillations in systems described by 
nonlinear equations of motion. 
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this occasion assuming the damping to be very low, Sfcoo ~ s « 1 . The corresponding equation of 
motion, Eq. (6), may be presented in form (38) if we take eo= coo and 

f = -2Sq, Sozs. (4.40) 

The naive approach described above would allow us to find small corrections, of the order of S, to the 
free, non-decaying oscillations Acos(a>ot - cp). However, we already know from Eq. (9) that the main 
effect of damping is a gradual decrease of the free oscillation amplitude to zero, i.e. a very large change 
of the amplitude, though at low damping, 8 « coo, this decay takes large time t ~ v » Vcoq. Hence, if 
we want our approximate method to be productive (i.e. to work at all time scales, in particular for forced 
oscillations with established, constant amplitude and phase), we need to account for the fact that the 
small right-hand part of Eq. (38) may eventually lead to essential changes of oscillation amplitude A 
(and sometimes, as we will see below, also of oscillation phase cp) at large times, because of the slowly 
accumulating effects of the small perturbation. 13 

This goal may be achieved by the account of these slow changes already in the "0 th 
approximation", i.e. the basic part of the solution in expansion (39): 



0 m order 
RWA 
solution 



q {0) = A(t)cos[cot-<p(t)], with.A,(p^0 at^^O. 



(4.41) 



The approximate methods based on Eqs. (39) and (41) have several varieties and several names, 14 but 
their basic idea and the results in the most important approximation (41) are the same. Let me illustrate 
this approach on a particular, simple but representative example of a dissipative (but high-0 pendulum 
driven by a weak sinusoidal external force with a nearly-resonant frequency: 

q + 2Sq + a>l sin q = f 0 cos cot, (4.42) 

with | to - con|, S « coo, and the force amplitude fo so small that \q\ « 1 at all times. From what we know 
about the forced oscillations from Sec. 1, it is natural to identify co in the left-hand part of Eq. (38) with 
the force frequency. Expanding sin q into the Taylor series in small q, keeping only the first two terms 
of this expansion, and moving all the small terms to the right-hand part, we can bring Eq. (42) to the 
canonical form (38): 

2 



q + co q = -2Sq + l^coq + aq + f 0 cos cot = f (t, q, q) . (4.43) 



Here a = coq 2 /6 in the case of the pendulum (though the calculations below will be valid for any a), and 
the second term in the right-hand part was obtained using the approximation already employed in Sec. 1: 

2 2 

(co - coo)q ~ 2co(co - coo)q = Ico^q, where % = co - coo is the detuning parameter that was already used 
earlier - see Eq. (21). 



13 The same flexible approach is necessary to approximations used in quantum mechanics. The method discussed 
here is close in spirit (but not identical) to the WKB approximation (see, e.g., QM Sec. 2.4) rather to the 
perturbation theory varieties (QM Ch. 6). 

14 In various texts, one can meet references to either the small parameter method or asymptotic methods. The list 
of scientists credited for the development of this method and its variations notably includes J. Poincare, B. van der 
Pol, N. Krylov, N. Bogolyubov, and Yu. Mitroplolsky. Expression (41) itself is frequently called the Rotating- 
Wave Approximation - RWA. (The origin of the term will be discussed in Sec. 6 below.) In the view of the 
pioneering role of B. van der Pol in the development of this approach, in some older textbooks the rotating-wave 
approximation is called the "van der Pol method". 
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Now, following the general recipe expressed by Eqs. (39) and (41), in the I s approximation in / 
oc s, i5 we may look for the solution to Eq. (43) in the form 

g(t) = A cos*? + q m (t), where ^ = (ot-q>, q (1) ~ s . (4.44) 

Let us plug this assumed solution into both parts of Eq. (43), leaving only the terms of the first order in 
s. Thanks to our (smart :-) choice of co in the left-hand part of that equation, the two zero-order terms in 
that part cancel each other. Moreover, since each term in the right-hand part of Eq. (43) is already of the 
order of s, we may drop cc s from the substitution into that part at all, because this would give us 
only terms O(^) or higher. As a result, we get the following approximate equation: 

q m +co 2 q m =f (0) =-2S—(Acos x ¥) + 2^coAcos'¥ + a(Acos x ¥y+f 0 cosQ}t. (4.45) 

dt 

According to Eq. (41), generally A and q> should be considered as (slow) functions of time. 
However, let us leave the analyses of transient process and system stability until the next section, and 
use Eq. (45) to find stationary oscillations in the system, that are established after the initial transient. 
For that limited task, we may take A = const, cp = const, so that q (0) presents sinusoidal oscillations of 
frequency co. Sorting the terms in the right-hand part according to their time dependence, 16 we see that it 
has terms with frequencies co and 3 co: 

f (0) = (2%coA + ^aA 2 +/ 0 cos^)cos v F + (2«-/ 0 sin^)sin v F + ^-«^ 3 cos3 v F. (4.46) 

Now comes the main trick of the rotating-wave approximation: mathematically, Eq. (45) may be 
viewed as the equation of oscillations in a linear, dissipation-free harmonic oscillator of frequency co 
(not cool) under the action of an external force represented by the right-hand part of the equation. In our 
particular case, it has three terms: two quadrature components at that very frequency co, and the third 
one at frequency loo. As we know from our analysis of this problem in Sec. 1, if any of the first two 
components is nonvanishing, q (l) grows to infinity - see Eq. (19) with 8 = 0. At the same time, by the 
very structure of the rotating-wave approximation, q^ has to be finite - moreover, small! The only way 
out of this contradiction is to require that amplitudes of both quadrature components of / 0) with 
frequency coave equal to zero: 

3 , 

2£eaA + —aA+f 0 cos<p = 0, 28aiA - f 0 sin <p = 0. (4.47) 

These two harmonic balance equations enable us to find both parameters of the forced 
oscillations: their amplitude A and phase cp. In particular, the phase may be readily eliminated from this 
system (most easily, by expressing sin^? and zo%cp from the corresponding equations, and then requiring 

2 2 

the sum sin cp + cos cp to equal 1), and the solution for amplitude A presented in the following implicit 
but convenient form: 



15 For a mathematically rigorous treatment of the higher approximations, see, e.g., Yu. Mitropolsky and N. Dao, 
Applied Asymptotic Methods in Nonlinear Oscillations, Springer, 2004. A more laymen (and somewhat verbose) 
discussion of various oscillatory phenomena may be found in the classical text A. Andronov, A. Vitt, and S. 
Khaikin, Theory of Oscillators, Dover, 2011. 

16 Using the second of Eqs. (44), cos cot may be rewritten as cos QV + <p) = cos *P cos <p - simF sin (p. Then using 
the trigonometric identity cos 3v F = (3/4)cos ¥ + (l/4)cos 3^ - see, e.g., MA Eq. (3.3) results in Eq. (46). 
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A 1 = f \ - 1 -, where £(,4) = £ + = © ' °' A 



4co 2 % 2 (A) + S 2 8 co 



co 0 



8 co 



2 A 

(4.48) 



This expression differs from Eq. (22) for the linear resonance in the low-damping limit only by the 
replacement of the detuning £ with its effective amplitude-dependent value Ej^A) or, equivalently, of the 
eigenfrequency coo of the resonator with its effective, amplitude-dependent value 

1 a A 2 

co 0 (A) = co 0 -- . (4.49) 

8 co 

The physical meaning of coq{A) is simple: this is just the frequency of free oscillations of amplitude A in 
a similar nonlinear system, but with zero damping. Indeed, for 8 = 0 and fo = 0 we could repeat our 
calculations, assuming that co is an amplitude-dependent eigenfrequency a>o(A), to be found. Then the 
second of Eqs. (47) is trivially satisfied, while the second of them gives Eq. (49). 

Expression (48) allows one to draw the curves of this nonlinear resonance just by bending the 
linear resonance plots (Fig. 1) according to the so-called skeleton curve expressed by Eq. (49). Figure 4 
shows the result of this procedure. Note that at small amplitude, co{A) — » coo, and we return to the usual, 
"linear" resonance (22). 



1.5 



0.5 



0.9 



1.1 



coIcoq 



Fig. 4.4. Nonlinear resonance as described 
by the rotating-wave approximation result 
(48), for the particular case a = coq 2 /6, Slco = 
0.01 (i.e. Q = 50), and seven values of 
parameter fsjcoi, increased by equal steps 
from 0 to 0.035. 



To bring our solution to its logical completion, we should still find the first perturbation q (l \t) 
from what is left of Eq. (45). Since the structure of this equation is similar to Eq. (13) with the force of 
frequency 3co and zero damping, we may use Eqs. (16)-(17) to obtain 



<Ao = 



32co 



cxA 3 cos \cot-cp). 



(4.50) 



Adding this perturbation (note the negative sign!) to the sinusoidal oscillation (41), we see that as the 
amplitude A of oscillations in a system with a > 0 (e.g., a pendulum) grows, their waveform become a 
bit more "blunt" near the maximum deviations from the equilibrium. 

Expression (50) also allows an estimate of the range of validity of the rotating-wave 
approximation: since it has been based on the assumption « |^ (0) | < A, for this particular problem 
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2 2 2 2 

we have to require oA /32co « 1. For a pendulum (with a = m 16), this condition becomes A « 
1/192. Though numerical coefficients in such strong inequalities should be taken with a grain of salt, the 
smallness of this particular coefficient gives a good hint that the method should give very good results 
even for relatively large oscillations with^4 ~ 1. In Sec. 7 below, we will see that this is indeed the case. 

From the mathematical viewpoint, the next step would be to calculate the next approximation 

q(t) = A cos + q {l) (t) + q {2) (t), q (2) ~ s 2 , (4.51) 

and plug it into Eq. (43) that (thanks to our special choice of </ 0) and g (1) ), would retain only 
q (2) +co 2 q (2) in its left-hand part. Again, requiring that amplitudes of two quadrature components of 

frequency a> in the right-hand part to be zero, we may get the second-order corrections to A and (p. Then 
we may use the remaining part of the equation to calculate q (2 \ and then go after the third-order terms, 
etc. However, for most purposes the sum q {0) + q (l \ and sometimes even just the crudest approximation 
</ 0) alone, are completely sufficient. For example, according to Eq. (50), for a simple pendulum (a = 
coqI6) swinging as much as between the opposite horizontal positions {A = nil), the 1 st order correction 
q^ is of the order of 0.5%. (Soon beyond this value, completely new dynamic phenomena start - see 
Sec. 7 below, but these phenomena cannot be covered by the rotating-wave approximation, at least in 
our current form.) Due to this reason, higher approximations are rarely pursued either in physics or 
engineering. 



4.3. RWA equations 

A much more important issue is the stability of solutions described by Eq. (48). Indeed, Fig. 4 
shows that within a certain range of parameters, these equations give three different values for the 
oscillation amplitude (and phase), and it is important to understand which of these solutions are stable. 
Since these solutions are not the fixed points in the sense discussed in the Sec. 3.2 (each point in Fig. 4 
represents a nearly-sinusoidal oscillation), their stability analysis needs a more general approach that 
would be valid for oscillations with amplitude and phase slowly evolving in time. This approach will 
also enable the analysis of non-stationary (especially the initial transient) processes that are of key 
importance for some dynamic systems. 

First of all, let us formalize the way the harmonic balance equations, such as Eqs. (47), are 
obtained for the general case (38) - rather than for the particular Eq. (43) considered in the last section. 
After plugging in the 0 th approximation (41) into the right-hand part of equation (38) we have to require 
the amplitudes of its both quadrature components of frequency <x> to be zero. From the standard Fourier 
analysis we know that these requirements may be presented as 

Harmonic 
(4.52) balance 
equations 

where symbol ... means time averaging - in our current case, over the period Inlco of the right-hand part 
of Eq. (52), with the arguments calculated in the 0 th approximation: 

f (0) = f(t,q (0) ,q (0) ,...) = f^Acos^-Aasin ¥,...), with x ¥ = cot-(p. (4.53) 

Now, for a transient process the contribution of q ( ' to left-hand part of Eq. (38) is not zero any 
longer, because both amplitude and phase may be slow functions of time - see Eq. (41). Let us calculate 
this contribution. The exact result would be 



/ (O) sin*F=0, / (0, cos v F = 0, 
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••(0) , 2 (0) 

q + co q y ' 



dt 2 



+ CO 



A cos(ftrf - cp) 



(4.54) 



= {2 + 2<pcoA - <p 2 A)cos(cot -cp)- 2A(co - <p)sin(cot - cp). 



However, in the first approximation in s, we may neglect the second derivative of A, and also the 
squares and products of the first derivatives of A and cp (that are all of the second order in s), so that Eq. 
(54) is reduced to 



q m + co 2 q {0) « 2A(pco cos(cot - (p) - 2Acosm(cot - <p) . 



(4.55) 



In the right-hand part of Eq. (52), we can neglect the time derivatives of the amplitude and phase at all, 
because this part is already proportional to the small parameter. Hence, in the first order in s, Eq. (38) 
becomes 



q (l) +co 2 q (l) 



Ao). 

J ef " 



/ (0) -(2^^cos v F-2i»sin v F). 



(4.56) 



Now, applying Eqs. (52) to function/Lf (0) , and taking into account that the time averages of sin Zv F 
and cos 2v F are both equal to Vz, while the time average of the product sinYcos 1 ? vanishes, we get a pair 
of so-called RWA equations (alternatively called "the reduced equations" or sometimes "the van der Pol 
equations") for the time evolution of the amplitude and phase: 



1 



1 



A = /^sin^, cp = — /^cos^. 

co coA 



(4.57a) 



Extending the definition (4) of the complex amplitude of oscillations to their slow evolution in time, a(t) 
= A(t)Qxp{i(p{t)}, and differentiating this relation, we see that two equations (57a) may be also re- written 
in the form of either one equation for a: 



Alternative 
forms of 
RWA 
equations 



CO CO 



or two equations for the real and imaginary parts of a(t) = u(t) + iv(t): 



1 



/ (0) sincttf, v = — f m coscot 



1 



CO 



(0) 



CO 



(4.57b) 



(4.57c) 



The first-order harmonic balance equations (52) are evidently just the particular case of the RWA 
equations (57) for stationary oscillations (A = <p = 0). 17 

Superficially, the system (57a) of two coupled, first-order differential equations may look more 
complex than the initial, second-order differential equation (38), but actually it is usually much simpler. 
For example, let us spell them out for the easy case of free oscillations a linear oscillator with damping. 
For that, we may reuse the ready Eq. (46) with a=fo = 0, turning Eqs. (4.57a) into 



17 One may ask why cannot we stick to the just one, most compact, complex-amplitude form (57b) of the RWA 
equations. The main reason is that when function f(q,q,t) is nonlinear, we cannot replace its real arguments, 

such as q = Acos{cot - (p), with their complex-function representations like aexp (as could be done in the 

linear problems considered in Sec. 4.1), and need to use real variables, such as either {A, <p) or {u, v), anyway. 
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A = - — f (0) sin^ = -—(2^coAcos x ¥ + 2ScaAsm x ¥)sm x ¥ = -5 A, (4.58a) 

CO CO 

( p = J_f(°) C0S x ¥ = —(2^coAcos x ¥ + 2&u4sin v F)cos v F = £. (4.58b) 
coA coA 

The solution of Eq. (58a) gives us the same "envelope" law A(t) = A(0)e' a as the exact solution 
(10) of the initial differential equation, while the elementary integration of Eq. (58b) yields (pit) = ^ t + 
giQi) = cot - coot+ ^0). This means that our approximate solution, 

q (0) (t) = A(t) cos[cot - (p{t)] = A(0)e~ 5t cos[co 0 t - <p(0)\ (4.59) 

agrees with the exact Eq. (9), and misses only correction (8) to the oscillation frequency, that is of the 
second order in 5, i.e. of the order of £ - beyond the accuracy of our first approximation. It is 
remarkable how nicely do the RWA equations recover the proper frequency of free oscillations in this 
autonomous system - in which the very notion of co is ambiguous. 

The situation is different at forced oscillations. For example, for the (generally, nonlinear) 
oscillator described by Eq. (43) with fo^O, Eqs. (57a) yield the RWA equations, 

A = -SA + ^-sm<p, A(p = %(A)A + ^cos(p, (4.60) 
2co 2co 

which are valid for an arbitrary function <%A), provided that the nonlinear detuning remains much 
smaller than the oscillation frequency. Here (after a transient), the amplitude and phase tend to the 
stationary states described by Eqs. (47). This means that cp becomes a constant, so that <f^ — > Acos(cot - 
const), i.e. the RWA equations again automatically recover the correct frequency of the solution, in this 
case equal to that of the external force. 

Note that each stationary oscillation regime, with certain amplitude and phase, corresponds to a 
fixed point of the RWA equations, so that the stability of those fixed points determine that of the 
oscillations. In what follows, we will carry out such an analysis for several simple systems of key 
importance for physics and engineering. 



4.4. Self-oscillations and phase locking 

The rotating-wave approximation was pioneered by B. van der Pol in the late 1920s for analysis 
of one more type of oscillatory motion: self-oscillations. Several systems, e.g., electronic rf amplifiers 
with positive feedback, and optical media with quantum level population inversion, provide convenient 
means for the compensation, and even over-compensation of the intrinsic energy losses in oscillators. 
Phenomenologically, this effect may be described as the change of sign of the damping coefficient 8 
from positive to negative. Since for small oscillations the equation of motion is still linear, we may use 
Eq. (9) to describe its general solution. This equation shows that at 8< 0, even infinitesimal deviations 
from equilibrium (say, due to unavoidable fluctuations) lead to oscillations with exponentially growing 
amplitude. Of course, in any real system such growth cannot persist infinitely, and shall be limited by 
this or that effect - e.g., in the above examples, respectively, by amplifier saturation or electron 
population exhaustion. 
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In many cases, the amplitude limitation may be described reasonably well by nonlinear 
damping: 

2Sq^2dq + J3q 3 , (4.61) 

with ft > 0. Let us analyze this phenomenon, applying the rotating- wave approximation to the 
corresponding homogeneous differential equation: 

q + 2Sq + co 2 q + /3q 3 = 0. (4.62) 

Carrying out the dissipative and detuning terms to the right hand part as/ we can readily calculate the 
right-hand parts of the RWA equations (57a), getting 18 

A = -8(A) A, where S(A) = S + -/3o) 2 A 2 , (4.63a) 

8 

A<p = £A. (4.63b) 

The second of these equations has exactly the same form as Eq. (58b) for the case of decaying 
oscillations and hence shows that the self-oscillations (if they happen, i.e. if A ^ 0) have frequency coo of 
the oscillator itself - see Eq. (59). Equation (63a) is more interesting. If the initial damping Sis positive, 
it has only the trivial fixed point, A 0 = 0 (that describes the oscillator at rest), but if 8 is negative, there is 
also another fixed point, 

,1/2 

(4.64) 



4 = 



.2 

J 



which describes steady self-oscillations with a non-zero amplitude. 

Let us apply the general approach discussed in Sec. 3.2, the linearization of equations of motion, 
to this RWA equation. For the trivial fixed point Ao = 0, the linearization of Eq. (63a) is reduced to 
discarding the nonlinear term in the definition of the amplitude-dependent damping d\A). The resulting 
linear equation evidently shows that the system's equilibrium point, A = A 0 = 0, is stable at S> 0 and 
unstable at 8< 0. (We have already discussed this self-excitation condition above.) The linearization of 
Eq. (63a) near the non-trivial fixed point A\ requires a bit more math: in the first order in 

A = A- A x — > 0 , we get 

A = A = -S(A l +A)--/3g) 2 (A 1 +Af *-SA--/3a 2 3A 2 A = (-S + 3S)A = 2SA, (4.65) 
8 8 

where Eq. (64) has been used to eliminate A\. We see that fixed point A\ (and hence the self-oscillation 
process) is stable as soon as it exists ( 8 < 0 ) - very much similar to the situation in our "testbed 
problem" (Fig. 2.1). 

Now let us consider another important problem: the effect of a external sinusoidal force on a 
self-excited oscillator. If the force is sufficiently small, its effects on the self-excitation condition and 
the oscillation amplitude are negligible. However, if frequency a> of such weak force is close to the 
eigenfrequency a>o of the oscillator, it may lead to a very important effect of phase-locking (also called 



18 For that, one needs to use the trigonometric identity sin 3v F = (SA^sin^F - (l/^sinS^ - see, e.g., MA Eq. (3.3). 
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"synchronization"). At this effect, oscillator's frequency deviates from coo, and becomes exactly equal to 
the external force's frequency co, within a certain range 



A < co - co 0 <+A. 



(4.66) 



In order to prove this fact, and also to calculate the phase locking range width 2A, we may repeat 
the calculation of the right-hand parts of the RWA equations (57a), adding term ^/ocos cot to the right- 
hand part of Eq. (62) - cf. Eqs. (42)-(43). This addition modifies Eqs. (63) as follows: 19 



A = -S(A)A + ^-smq>, 
2co 

f 

A<p = % A + — coscp. 
2co 



(4.67a) 



(4.67b) 



If the system is self-excited, and the external force is weak, its effect on the oscillation amplitude is 
small, and in the first approximation in fo we can take A to be constant and equal to the value A\ given 
by Eq. (64). Plugging this approximation into Eq. (67b), we get a very simple equation 20 



<p = £, + Acos^ . 



Phase 



(4.68) locking 
equation 



where in our current case 



fo 
2coA 



(4.69) 



Within the range - |A| < + |A|, Eq. (68) has two fixed points on each 2^-segment of variable qr. 



q> ± =+ arccos 



i 

A 



+ 7.7UI 



(4.70) 



It is easy to linearize Eq. (68) near each point to analyze their stability in our usual way; 
however, let me this case to demonstrate another convenient way to do this in ID systems, using the so- 
called phase plane - the plot of the right-hand part of Eq. (68) as a function of cp - see Fig. 5. 




Fig. 4.5. Phase plane of a phase- 
locked oscillator, for the particular 
case A/2,/ 0 >0. 



19 Actually, this result should be evident, even without calculations, from the comparison of Eqs. (60) and (63). 

20 This equation is ubiquitous in phase locking systems, including even some digital electronic circuits used for 
that purpose. 
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Since the positive values of this function correspond to the growth of cp in time, and vice versa, 
we may draw the arrows showing the direction of phase evolution. From this graphics, it is clear that 
one of these fixed points (for f 0 >0, <p+) is stable, while its counterpart is unstable. Hence the magnitude 
of A given by Eq. (69) is indeed the phase locking range (or rather it half) that we wanted to find. Note 
that the range is proportional to the amplitude of the phase locking signal - perhaps the most important 
feature of phase locking. 

In order to complete our simple analysis, based on the assumption of fixed oscillation amplitude, 
we need to find the condition of validity of this assumption. For that, we may linearize Eq. (67a), for the 
stationary case, near value A\, just as we have done in Eq. (65) for the transient process. The stationary 
result, 

1 f 

A = A-A, = —- r— sin (3, « A 
1 2\S\2co - 

shows that our assumption, \A \ « A\. and hence the final result (69), are valid if the phase locking 
range, 2 A, is much smaller than 4|£|. 



A 
28 



sin cp ± , 



(4.71) 



4.5. Parametric excitation 



In both problems solved in the last section, the stability analysis was easy because it could be 
carried out for just one slow variable, either amplitude or phase. Generally, such analysis of the RWA 
equations involves both these variables. The classical example of such situation is provided by one 
important physical phenomenon - the parametric excitation of oscillations. An elementary example of 
such oscillations is given by a pendulum with an externally-changed parameter, for example length l(t) - 
see Fig. 6. Experiments (including those with playground swings :-) and numerical simulations show 
that if the length is changed {modulated) periodically, with frequency 2co that is close to 2o>q and a 
sufficiently large swing A/, the equilibrium position of the pendulum becomes unstable, and it starts 
swinging with frequency a> equal exactly to the half of the length modulation frequency (and hence only 
approximately equal to the average eigenfrequency a>o of the oscillator). 



2co 



t 



A/ 



CO « CO 



iV(0 



Fig. 4.6. Parametric excitation of pendulum oscillations. 



For an elementary analysis of this effect we may consider the simplest case when the oscillations 
are small. At the lowest point (6 = 0), where the pendulum moves with the highest velocity v max , 
string's tension F is higher than mg by the centripetal force: F max = mg + mv max II. On the contrary, at 
the maximum deviation of the pendulum from the equilibrium, the force is weakened by string's tilt: 
■Fmin = mgcos^max- Using the energy conservation, E = mv max 12 = mgl{\ - cos^max), we may express these 
values as F max = mg + 2E/1 and F min = mg - Ell. Now, if during each oscillation period the string is pulled 
up sharply and slightly by A/ (|A/| « I) at each of its two passages through the lowest point, and is let to 
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go down by the same amount at each of two points of the maximum deviation, the net work of the 
external force per period is positive: 

^-2(F„-F m JA/«6y£, (4.72) 

and hence results in an increase of the oscillator's energy. If the so-called modulation depth A//2/ is 
sufficient, this increase may be sufficient to overcompensate the energy drained out by damping. 
Quantitatively, Eq. (10) shows that low damping (8« coq) leads to the following energy decrease, 

A£*-4;r — E , (4.73) 

per oscillation period. Comparing Eqs. (72) and (73), we see that the net energy flow into the 
oscillations is positive, W+ AE > 0, i.e. oscillation amplitude has to grow if 21 

Al 2nd n , * ^ ^ 

— > — . (4.74) 

/ 3co 0 3Q 

Since this result is independent on E, the growth of energy and amplitude is exponential (for sufficiently 
low E), so that Eq. (74) is the condition of parametric excitation - in this simple model. 

However, this result does not account for the possible difference between the oscillation 
frequency a> and the eigenfrequency a>o, and also does not clarify whether the best phase shift between 
the parametric oscillations and parameter modulation, assumed in the above calculation, may be 
sustained automatically. In order to address these issues, we may apply the rotating-wave approximation 
to a simple but reasonable linear equation 

q + 2Sq + a>l (1 + /u cos 2cot)q = 0, (4.75) 

describing the parametric excitation for a particular case of sinusoidal modulation of a>^{t). Rewriting 
this equation in the canonical form (38), 

q + co 2 q = f(t,q,q) = -2Sq + 2%coq - /uco^q cos 2cot, (4.76) 

and assuming that the dimensionless ratios olco and \%\la>, and the modulation depth /u are all much less 
than 1, we may use general Eqs. (57a) to get the following RWA equations: 

A = -8 A - A sin 2m, 
4 

4 (4.77) 

Aq> = AS, — Azo%2m. 

These equations evidently have a fixed point A 0 = 0, but its stability analysis (though possible) is 
not absolutely straightforward, because phase m of oscillations is undetermined at that point. In order to 
avoid this (technical rather than conceptual) technical difficulty, we may use, instead of the real 



21 A modulation of pendulum's mass (say, by periodic pumping water in and out of a suspended bottle) gives a 
qualitatively similar result. Note, however, that parametric oscillations cannot be excited by modulating any 
oscillator's parameter - for example, oscillator's damping coefficient (at least if it stays positive at all times), 
because its does not change system's energy, just the energy drain rate. 
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amplitude and phase of oscillations, either their complex amplitude a = A exp or its Cartesian 
components u and v - see Eqs. (4). Indeed, for our function f, Eq. (57b) gives 



d = (—5 + i^)a -i^-a *, 
4 



(4.78) 



while Eqs. (57c) yield 



RWA 
equations 
for 

parametric 
excitation 



r> f, jUCO 

u = -ou - cv V, 

4 

r> jUCO 

v = -ov + gu u . 

4 



(4.79) 



We see that in contrast to Eqs. (77), in Cartesian coordinates {u, v} the trivial fixed point ao = 0 
(i.e. uo = v 0 = 0) is absolutely regular. Moreover, equations (78)-(79) are already linear, so they do not 
require any additional linearization. Thus we may use the same approach as was already used in Sees. 
3.2 and 4.1, i.e. look for the solution of Eqs. (79) in the exponential form &xp{At}. However, now we are 
dealing with two variables, and should allow them to have, for each value of X, a certain ratio ulv. For 
that, we should take the partial solution in the form 



u = c„e 



v = ce 



(4.80) 



where constants c u and c v are frequently called the distribution coefficients. Plugging this solution into 
Eqs. (79), we get for them the following system of two linear algebraic equations: 



(S-A)c u +(-t-^)c v =0, 



JUCO 



)c u +(-S-A)c v =0. 



(4.81) 



The characteristic equation of this system, 

JUCO 



S-X 

jUCO 



■4 



4 

S-X 



= X 2 +2SX + S 2 



V 



/uco^ 



= 0, 



has two roots: 



X ± =-S± 



' JUCO N 



■r 



1/2 



Requiring the fixed point to be unstable, Re/l + > 0 , we get the parametric excitation condition 



^->{s 2 + e) 



,1/2 



(4.82) 



(4.83) 



(4.84) 



Thus the parametric excitation may indeed happen without any artificial phase adjustment: the arising 
oscillations self-adjust their phase to pick up energy from the external source responsible for the 
parameter variation. 
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Our key result (84) may be compared with two other calculations. First, in the case of negligible 
damping (8 = 0), Eq. (84) turns into condition juco/4 > |£| . This result may be compared with the well- 
developed theory of the so-called Mathieu equation whose canonical form is 

^ + {a-2bcos2v)y = 0. (4.85) 
dv 

It is evident that with the substitutions y — > q, v — > at, a — > (a>o/a) t b — > -ju/2, this equation is just a 
particular case of Eq. (75) for 8= 0. In terms of Eq. (85), the result of our approximate analysis may be 
re-written just as b > I a - 1 I , and is supposed to be valid for b « 1 . This condition is shown in Fig. 7 
together with the numerically calculated 22 stability boundaries of the Mathieu equation. 
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Fig. 4.7. Stability boundaries of the Mathieu 
equation (85), as calculated: numerically 
(curves) and using the rotating-wave 
approximation (dashed straight lines). In the 
regions numbered by various n the trivial 
solution j = 0 of the equation is unstable, i.e. its 
general solution y{v) includes an exponentially 
growing term. 



One can see that the rotating-wave approximation works just fine within its applicability limit 
(and beyond :-), though it fails to predict some other important features of the Mathieu equation, such as 
the existence of higher, more narrow regions of parametric excitation (at a « n 2 , i.e. a>o ~ aln, for all 
integer n), and some spill-over of the stability region into the lower half-plane a < 0. 23 The reason of 
these failures is the fact that, as can be seen in Fig. 7, these phenomena do not appear in the first 
approximation in the parameter modulation amplitude // <x q, that is the RWA applicability realm. 



In the opposite case of finite damping but exact tuning (£= 0, a~ m), Eq. (84) gives 

48 = 2_ 

C-\: Q 



(4.86) 



22 Such calculations may be substantially simplified by the use of the so-called Floquet theorem, which is also 
the mathematical basis for the discussion of wave propagation in periodic media - see the next chapter. 

23 This region describes, for example, the counter-intuitive stability of an inverted pendulum with the periodically 
modulated length, within a limited range of the modulation depth ju. 
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This condition may be compared with Eq. (74), taking A/// = 2ju.. The comparison shows that though the 
structure of these conditions is similar, the numerical coefficients are different by a factor close to 2. 
The first reason of this difference is that the instant parameter change at optimal moments of time is 
more efficient then the smooth, sinusoidal variation described by (75). Even more significantly, the 
change of pendulum's length modulates not only its eigenfrequency «o, as Eq. (75) implies, but also its 

1/2 

mechanical impedance Z = (gl) - the notion to be discussed in detail in the next chapter. (Due to the 
time restrictions, I have to leave the analysis of the general case of the simultaneous modulation of a>o 
and Z for reader's exercise.) 

Before moving on, let me summarize the most important differences between the parametric and 
forced oscillations: 

(i) Parametric oscillations completely disappear outside of their excitation range, while the 
forced oscillations have a non-zero amplitude for any frequency and amplitude of the external force - 
see Eq. (18). 

(ii) Parametric excitation may be described by a linear homogeneous equation - e.g., Eq. (75) - 
which cannot predict any finite oscillation amplitude within the excitation range, even at finite damping. 
In order to describe stationary parametric oscillations, some nonlinear effect has to be taken into 
account. (Again, I am leaving analyses of such effects for reader's exercises.) 

One more important feature of the parametric oscillations will be discussed in the end of the next 

section. 



4.6. Fixed point classification 

RWA equations (79) give us a good pretext for a brief discussion of fixed points of a dynamic 
system described by two time-independent, first-order differential equations. 24 After their linearization 
near a fixed point, the equations for deviations can always be presented in the form similar to Eq. (79): 

q,=M n q x +M n q 2 , 
q 2 = M 2l q x + M 22 q 2 , 

where My -(with = 1, 2) are some real scalars that may be understood as elements of a 2x2 matrix M. 
Looking for an exponential solution of the type (80), 

q x =c x e^ t , q 2 =c 2 e^ t , (4.88) 
we get a more general system of two linear equations for the distribution coefficients c\£. 

(M n - A)c, +M, 2 c 2 = 0, 

v ii ) i 12 2 (4 g9) 

M 2l c l + (M 22 - X)c 2 = 0. 

These equations are consistent if 



24 Autonomous systems described by a single second-order differential equation, say F(q, q, q) = 0 , also belong 
to this class, because we may treat velocity q = v as a new variable, and use this definition as one first-order 
differential equation, and the initial equation, in the formF(<7, v, v) = 0 , as the second first-order equation. 
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Characteristic 
(4.90) equation of 
system (87) 



giving us a quadratic characteristic equation 

X 1 -X(M n +M 22 )+(M u M 22 -M 12 M 21 ) = 0. (4.91) 

Its solution, 25 

X ± =^(M u +M 22 )±^[(M n -M 22 ) 2 +4M l2 M 2l ] 112 , (4.92) 

shows that the following situations are possible: 

A. The expression under the square root, (Mn- M 2 i) + AMijMji, is positive. In this case, both 
characteristic exponents X+ are real, and we can distinguish three sub-cases: 

(i) Both X+ and X. are negative. In this case, the fixed point is evidently stable. Because of 
generally different magnitudes of exponents X+, the process presented on the phase plane [q l ,q 2 ~\ (Fig. 
8a) may be seen as consisting of two stages: first, a faster (with rate \X+\) relaxation to a linear 
asymptote, 26 and then a slower decline, with rate \X.\, along this line, i.e. at the virtually fixed ratio of the 
variables. Such fixed point is called the stable node. 

(ii) Both X+ and A. are positive. This case (rarely met in actual physical systems) of the 
unstable node differs from the previous one only by the direction of motion along the phase plane 
trajectories (see dashed arrows in Fig. 8a). Here the variable ratio is also approaching a constant soon, 
but now the one corresponding to the larger of the rates X 

(iii) Finally, in the case of a saddle (X+ > 0, X. < 0) the system dynamics is different (Fig. 
8b): after the rate-l X+ 1 relaxation to the /L-asymptote, the perturbation starts to grow, with rate X., along 
one of two opposite directions. (The direction is determined on which side of another straight line, 
called separatrix, the system has been initially.) It is evident that the saddle 27 is an unstable fixed point. 

B. The expression under the square root, (Mn- M22) 2 + 4 M\iM%\, is negative. In this case the 
square root in Eq. (92) is imaginary, making the real parts of both roots equal, Re/Lt = (Mi 1 + Mri)H, 
and their imaginary parts equal but sign-opposite. As a result, here there can be just two types of fixed 
points: 

(i) Stable focus, at (Mn + M22) < 0. The phase plane trajectories are spirals going to the 
center (i.e. toward the fixed point) - see Fig. 8c with solid arrow. 

(ii) Unstable focus, taking place at (Mn + M22) > 0, differs from the stable one only by 
the direction of motion along the phase trajectories - see the dashed arrow in Fig. 8c. 



25 In terms of linear algebra, X± are the eigenvalues, and the corresponding sets \c\, c 2 ]± , the eigenvectors of 
matrix M with elements Mjj: 

26 The asymptote direction may be found by plugging the value X+ back into Eq. (89) and finding the 
corresponding ratio C\lc 2 . 

27 The term "saddle" is due to the fact that system's dynamics in this case is qualitatively similar to those of 
particle's motion in the 2D potential U{ q y ,q 2 ) having the shape of a horse saddle (or a mountain pass). 
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Fig. 4.8. Typical trajectories on the phase plane [q 1 ,q 2 ] near fixed points of various types: (a) node, 
(b) saddle, (c) focus, and (d) center. The particular values of the matrix M used in the first three 
panels correspond to the RWA equations (81) for parametric oscillators with £ = S, and three 
different values of parameter /ja>/4S: (a) 1.25, (b) 1.6 and (c) 0. 



C. Sometimes the border case, M\\ + Mn = 0, is also distinguished, and the corresponding fixed 
point is refereed to as the center (Fig. 8d). Considering centers a special category makes sense because 
such fixed points are typical for Hamiltonian systems whose first integral of motion may be frequently 
presented as the distance of the from a fixed point. For example, a harmonic oscillator without 
dissipation may be described by the system 

q = £, p = -mco 2 0 q, (4.94) 
m 
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that is evidently a particular case of Eq. (87) with M u = M 22 = 0, M\ 2 M 2 \ = - coq < 0, and hence (M\\- 

2 2 

M22) + 4Mi 2 M 2 i = -4«o < 0, and Mi 1 + M22 = 0. The phase plane of the system may be symmetrized by 
plotting q vs. the properly normalized momentum pi mco®. On the symmetrized plane, sinusoidal 
oscillations of amplitude A are represented by a circle of radius A about the center-type fixed point A = 
0. Such a circular trajectory correspond to the conservation of the oscillator's energy 



-2 2 



E = 



mq ma> 0 q ma> 0 



■ + 



+ q 



(4.95) 



This is a convenient moment for a brief discussion of the so-called Poincare (or "slow-variable", 
or "stroboscopic") plane. 28 From the point of view of the rotating-wave approximation, sinusoidal 
oscillations q(f) = Acos(a>t - cp), in particular those described by a circular trajectory on the real (or 
"fast") phase plane (Fig. 8c) correspond to a fixed point {A, q>), which may conveniently presented by a 
steady geometrical point on a plane with these polar coordinates (Fig. 9a). (As follows from Eq. (4), the 
Cartesian coordinates on that plane are u and v.) The quasi-sinusoidal process (41), with slowly 
changing A and q>, may be represented by a slow motion of that point on this Poincare plane. 



(a) 



v(0' 






A{ty/ 
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Fig. 4.9. (a) Presentation of a 
sinusoidal oscillation (point) and a 
slow transient (line) on the Poincare 
plane, and (b) transfer from the "fast" 
phase plane to the "slow" (Poincare) 
plane. 



Figure 9b shows one possible way to visualize the relation between the "real" phase plane of an 
oscillator, with symmetrized Cartesian coordinates q and p/mcoo, and the Poincare plane with Cartesian 
coordinates u and v: the latter reference frame rotates relative to the former one about the origin 
clockwise, with angular velocity co. 29 Another, "stroboscopic" way to generate the Poincare plane 
pattern is to have a fast glance at the "real" phase plane just once during the oscillation period T= 2nla>. 

In many cases, such presentation is more convenient than that on the "real" phase plane. In 
particular, we have already seen that the RWA equations for such important phenomena as phase 
locking and parametric oscillations, whose original differential equations include time explicitly, are 
time-independent - cf, e.g., (75) and (79) describing the latter effect. This simplification brings the 



28 Named after J. H. Poincare (1854-1912) who is credited, among many other achievements, for his contributions 
to special relativity (see, e.g., EM Chapter 9) and the idea of deterministic chaos (to be discussed in Chapter 9 
below). 

29 This notion of phase plane rotation is the basis for the rotating-wave approximation's name. (Word "wave" has 
sneaked in from this method's wide application in classical and quantum optics.) 
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equations into the category considered in this section, and enables the classification of their fixed points, 
which may shed additional light on their dynamic properties. 

In particular, Fig. 10 shows the classification of the trivial fixed point of a parametric oscillator, 
which follows from Eq. (83). As the parameter modulation depth ju is increased, the type of the trivial 
fixed point A\ = 0 on the Poincare plane changes from a stable focus (typical for a simple oscillator with 
damping) to a stable node and then to a saddle describing the parametric excitation. In the last case, the 
two directions of the perturbation growth, so prominently featured in Fig. 8b, correspond to the two 
possible values of the oscillation phase q>, with the phase choice determined by initial conditions. 




This double degeneracy of the parametric oscillation's phase could already be noticed from Eqs. 
(77), because they are evidently invariant with respect to replacement q> — > q> + n. Moreover, the 
degeneracy is not an artifact of the rotating-wave approximation, because the initial Eq. (75) is already 
invariant with respect to the corresponding replacement q(t) — > q{t - nlco). This invariance means that all 
other characteristics (e.g., the amplitude) of the parametric oscillations excited with either of two phases 
are absolutely similar. At the dawn of the computer age (in the late 1950s and early 1960s), there were 
substantial attempts, especially in Japan, to use this property for storage and processing digital 
information coded in the phase-binary form. 



4.7. Numerical approach 

If the amplitude of oscillations, by whatever reason, becomes so large that the nonlinear terms in 
the equation describing a system are comparable to its linear terms, numerical methods are virtually the 
only avenue available for their study. In Hamiltonian ID systems, such methods may be applied directly 
to integral (3.26), but dissipative and/or parametric systems typically lack first integrals of motion 
similar to Eq. (3.24), so that the initial differential equation has to be solved. 

Let us discuss the general idea of such methods on the example of what mathematicians call the 
Cauchy problem (finding the solution for all moments of time, starting from known initial conditions) 
for first-order differential equation 

q = f(t,q). (4.96) 

(The generalization to a set of several such equations is straightforward.) Breaking the time axis into 
small, equal steps h (Fig. 9) we can reduce the equation integration problem to finding the function 
value in the next time point, q n +i = q(t„+i) = q(t n + h) from the previously found value q n = q(t n ) - and, if 
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necessary, the values of q at other previous time steps. In the generic approach (called the Euler 
method), q„+\ is found using the following formula: 



k = hf(t n ,q n ). 



(4.97) 



It is evident that this approximation is equivalent to the replacement of the genuine function q{t), on the 
segment [t„, t n+i ], with the two first terms of its Taylor expansion in point t n : 



q(t n +h)~ q(t n ) + q(t n )h = q(t n ) + hf(t n ,q„). 



(4.98) 




Fig. 4.11. The basic notions used at numerical 
> integration of ordinary differential equations. 
t 



Such approximation has an error proportional to h . One could argue that making the step h 
sufficiently small the Euler' s method error might be done arbitrary small, but even with the number- 
crunching power of modern computers, the computation time necessary to reach sufficient accuracy may 
be too high for large problems. 30 Besides that, the increase of the number of time steps, which is 
necessary at /z — > 0 , increases the total rounding errors, and eventually may cause an increase, rather 
than the reduction of the overall error of the computed result. 

A more efficient way is to modify Eq. (97) to include the terms of the second order in h. There 
are several ways to do this, for example using the 2 nd -order Runge-Kutta method: 



1n + l =<ln+ k 2, 

h k 

k 2 =hf(t n +-,q n +^-), K=hf{t n ,q n ). 



(4.99) 



One can readily check that this method gives the exact result if function q(f) is a quadratic polynomial, 
and hence in the general case its errors are of the order of h 3 . We see that the main idea here is to first 
break the segment [t„, t n +i] in half (Fig. 1 1), then evaluate the right-hand part of the differential equation 
(96) at the point intermediate (in both t and q) between points n and (n + 1), and then use this 
information to predict q n +\. 

The advantage of the Runge-Kutta approach is that it can be readily extended to the 4 th order, 
without an additional breaking of the interval [t n , t„+{\.: 



30 In addition, the Euler method is not time-reversible - the handicap which may be essential for integration of 
Hamiltonian systems described by systems of second-order differential equations. However, this drawback may 
be readily overcome by the so-called leapfrogging - the overlap of time steps h for a generalized coordinate and 
the corresponding generalized velocity. 
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q n+l = q n + -{k x + 2k 2 + 2k 3 + k 4 ), 
6 

h k h k 

k 4 =hf(t n +h,q n +k 3 ), k 3 =hf(t n +-,q n +^-), k 2 = hf(t n +-,q a +-±), K=hf(t n ,q n ). 



(4.100) 



This method reaches much lower error, 0{h ), without being not too cumbersome. These features have 
made the 4 th -order Runge-Kutta the default method in most numerical libraries. Its extension to higher 
orders is possible but requires more complex formulas and is justified only for some special cases, e.g., 
very abrupt functions q(t). 31 The most frequent enhancement of the method is the automatic adjustment 
of step h to reach the specified accuracy. 

Figure 12 shows a typical example of application of that method to the very simple problem of a 
damped linear oscillator, for two values of fixed time step h (expressed in terms of the number N of such 
steps per oscillation period). Black lines connect the points obtained by the 4 th -order Runge-Kutta 
method, while the points connected by green straight lines present the exact analytical solution (22). A 
few-percent errors start to appear only at as few as -10 time steps per period, so that the method is 
indeed very efficient. I will illustrate the convenience and handicaps of the numerical approach to the 
solution of dynamics problems on the discussion of the following topic. 



<7(0 0 




co 0 t 



CD 0 t 



Fig. 4.12. Results of the fixed-point Runge-Kutta solution to the equation of linear oscillator with damping 
(with SIcoq = 0.03) for: (a) 30 and (b) 6 points per oscillation period. The results are shown by points; lines 
are only the guide for the eye. 



4.8. Higher harmonic and subharmonic oscillations 

Figure 13 shows the numerically calculated 32 transient process and stationary oscillations in a 
linear oscillator and a very representative nonlinear system, the pendulum described by Eq. (42), both 
with the same resonance frequency «o for small oscillations. Both systems are driven by a sinusoidal 



31 The most popular approaches in such cases are the Richardson extrapolation, the Bulirsch-Stoer algorithm, and 
a set of prediction-correction techniques, e.g. the Adams-Bashforth-Moulton method - see the literature 
recommended in MA Sec. 16 (hi). 

32 All numerical results shown in this section have been obtained by the 4 th -order Runge-Kutta method with the 
automatic step adjustment which guarantees the relative error of the order of 10" 4 - much smaller than the pixel 
size in the plots. 
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external force of the same amplitude and frequency - in this illustration, equal to the small-oscillation 
eigenfrequency coo of both systems. The plots show that despite a very substantial amplitude of the 
pendulum oscillations (an angle amplitude of about one radian) their waveform remains almost exactly 
sinusoidal. 33 On the other hand, the nonlinearity affects the oscillation amplitude very substantially. 
These results illustrate that the validity of the small-parameter method and its RWA extension far 
exceeds what might be expected from the formal requirement \q\ « 1 . 



q(t) 



q(t) 





co 0 t I In 



co 0 t I In 



Fig. 4.13. Oscillations induced by a similar sinusoidal external force (turned on at t - 0) in two 
systems with the same small-oscillation frequency co 0 and low damping - a linear oscillator (two 
top panels) and a pendulum (two bottom panels). d/co 0 = 0.03,/ 0 = 0. 1 , and co = co Q . 



The higher harmonic contents in the oscillation waveform may be sharply increased 34 by 
reducing the external force frequency to -coqIh, where integer n is the number of the desirable harmonic. 
For example, Fig. 14a shows oscillations in a pendulum described by the same Eq. (42), but driven at 
frequency cvo/3. One can see that the 3 rd harmonic amplitude may be comparable with that of the basic 
harmonic, especially if the external frequency is additionally lowered (Fig. 14b) to accommodate for the 
deviation of the effective frequency a>o(a) of own oscillations from its small-oscillation value a>o - see 
Eq. (49), Fig. 4 and their discussion in Sec. 2 above. 

Generally, the higher harmonic generation by nonlinear systems might be readily anticipated. 
Indeed, the Fourier theorem tells us that any non-sinusoidal periodic function of time, e.g., an initially 
sinusoidal waveform of frequency co, distorted by nonlinearity, may be presented as a sum of its basic 
harmonic and higher harmonics with frequencies nco. Note that an effective generation of higher 



33 In this particular case, the higher harmonic contents is about 0.5%, dominated by the 3 r harmonic whose 
amplitude and phase are in a very good agreement with Eq. (50). 

34 This method is used in practice, for example, for the generation of electromagnetic waves with frequencies in 
the terahertz range (10 12 -10 13 Hz) which still lacks efficient electronic self-oscillators. 
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harmonics is only possible with adequate nonlinearity of the system. For example, consider the 
nonlinear term aq used in equations explored in Sees. 2 and 3. If the waveform q(f) is approximately 
sinusoidal, such term can create only the basic and 3 rd harmonics. The "pendulum nonlinearity" sing 
cannot produce, without a constant component is process q(t), any even (e.g., the 2 nd ) harmonic. The 
most efficient generation of harmonics may be achieved using systems with the sharpest nonlinearities - 
e.g., semiconductor diodes whose current may follow an exponential dependence on the applied voltage 
through several orders of magnitude. 




0 5 10 15 20 25 30 

co 0 t 1 2n 

Fig. 4.14. Oscillations induced in a pendulum with damping S/coq = 0.03, driven by a sinusoidal 
external force of amplitude fo = 0.75, and frequency coq/3 (top panel) and 0.8<x>o/3 (bottom panel). 



However, numerical modeling of nonlinear oscillators, as well as experiments with their physical 
implementations, bring more surprises. For example, the bottom panel of Fig. 15 shows oscillations in a 
pendulum under effect of a strong sinusoidal force with a frequency close to 3a>o- One can see that at 
some parameter values and initial conditions the system's oscillation spectrum is heavily contributed 
(almost dominated) by the 3 rd sw^harmonic, i.e. a component that is synchronous with the driving force 
of frequency 3 co, but has the frequency co that is close to the eigenfrequency coo of the system. 

This counter- intuitive phenomenon may be explained as follows. Let us assume that the 
subharmonic oscillations of frequency co « coo have somehow appeared, and coexist with the forced 
oscillations of frequency 3co: 

q(t)«Acos¥ + A suh cos¥ svb , where ¥ = 3cot-<p, ¥ sub = cot -<p sub . (4.101) 
Then, the first nonlinear term aq 3 of the Taylor expansion of pendulum's nonlinearity sin q yields 
q 3 =(A C osV + A suhC osV sub y 

= A 3 cos 3 ¥ + 3A 2 A sub cos 2 ^ cos¥ sub + 3AA 2 mb cos^cos 2 ¥ sub + A 3 suh cos 3 ¥ sub . 
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While the first and the last terms of this expression depend only of amplitudes of the individual 
components of oscillations, the two middle terms are more interesting because they produce so-called 
combinational frequencies of the two components. For our case, the third term, 

3A Al h cos^cos 2 ¥ sub = ^AA 2 mh cos(¥ - 2¥ sub ) + ... , (4.103) 

of a special importance, because it produces, besides other combinational frequencies, the subharmonic 
component with the total phase 

*¥-2V mb =cot-<p + 2<p mb . (4.104) 

Thus, within a certain range of the mutual phase shift between the Fourier components, this nonlinear 
contribution is synchronous with the subharmonic oscillations, and describes the interaction that can 
deliver to it the energy from the external force, so that the oscillations may be self-sustained. Note, 
however, that the amplitude of the term (103) describing this energy exchange is proportional to the 
square of ^ su b, and vanishes at the linearization of the equations of motion near the trivial fixed point. 
This means that the point is always stable, i.e., the 3 ld subharmonic cannot be self-excited and always 
need an initial "kick-off - compare the two panels of Fig. 15. The same is evidently true for higher 
subharmonics. 
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Fig. 4.15. Oscillations induced in a pendulum with d/co 0 = 0.03 by a sinusoidal external force of 
amplitude f 0 = 3 and frequency 3<x>ox0.8, with initial conditions g(0) = 0 (the top row) and g(0) = 1 
(the bottom row). 



Only the second subharmonic presents a special case. Indeed, let us make a calculation similar to 
Eq. (102), by replacing Eq. (101) with 

q(t)*Acos¥ + A sub cos¥ sri) , where ¥ = 2at - <p, ^^^cot-cp^, (4.105) 
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sub " 



(4.106) 



Here the combinational-frequency term capable of supporting the 2 n subharmonic, 



2AA suh cos *F cos ¥ sub = AA suh cos(*F - ¥ sub ) = AA sub cos(erf - p + q> mh )+ ... , 



(4.107) 



is linear in the subharmonic amplitude, i.e. survives the equation linearization near the trivial fixed 
point. This mean that the second subharmonic may arise spontaneously, from infinitesimal fluctuations. 

Moreover, such excitation of the second subharmonic is very similar to the parametric excitation 
that was discussed in detail in Sec. 5, and this similarity is not coincidental. Indeed, let us redo 
expansion (4.106) at a somewhat different assumption that the oscillations are a sum of the forced 
oscillations at the external force frequency 2co, and an arbitrary but weak perturbation: 



Besides the inconsequential phase q>, the second term in the last formula is exactly similar to the term 
describing the parametric effects in Eq. (75). This fact means that for a weak perturbation, a system with 
a quadratic nonlinearity in the presence of a strong "pumping" signal of frequency 2co is equivalent to a 
system with parameters changing in time with frequency 2co. This fact is broadly used for the 
parametric excitation at high (e.g., optical) frequencies where the mechanical means of parameter 
modulation (see, e.g., Fig. 5) are not practicable. The necessary quadratic nonlinearity at optical 
frequencies may be provided by several nonlinear crystals, e.g., the lithium niobate (LiNbOs). 

Before finishing this chapter, let me elaborate a bit on a general topic: the relation between the 
numerical and analytical approaches to problems of dynamics (and physics as a whole). We have just 
seen that sometimes numerical solutions, like those shown in Fig. 15b, may give vital clues for 
previously unanticipated phenomena such as the excitation of subharmonics. (The phenomenon of 
deterministic chaos, which will be discussed in Chapter 9 below, presents another example of such 
"numerical discoveries".) One might also argue that in the absence of exact analytical solutions, 
numerical simulations may be the main theoretical tool for the study of such phenomena. These hopes 
are, however, muted by the problem that is frequently called the curse of dimensionality, 1 ' 5 in which the 
last word refers to the number of input parameters of the problem to be solved. 36 

Indeed, let us have another look at Fig. 15. OK, we have been lucky to find a new phenomenon, 
the 3 rd subharmonic generation, for a particular set of parameters - in that case, five of them: S/coo = 
0.03, led coo = 2.4, f 0 = 3, q(0) = 1, and dqldt (0) = 0. Could we tell anything about how common this 
effect is? Are subharmonics with different n possible in the system? The only way to address these 
questions computationally is to carry out similar numerical simulations in many points of the d- 



35 This term had been coined in 1957 by R. Bellman in the context of optimal control theory (where the 
dimensionality typically means the number of parameters affecting the system under control), but gradually has 
spread all over quantitative sciences using numerical methods. 

36 In EM Sec. 1.2, I discuss implications of the curse implications for a different case, when both analytical and 
numerical solutions to the same problem are possible. 



q(t) = A cos(2a>t - cp) + q (t), \q\«A. 



(4.108) 



Then, neglecting the small term proportional to q 2 , we get 

q 2 « A 2 cos 2 (2cot - (p) + 2q(t)Acos(2cot - (p). 



(4.109) 
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dimensional (in this case, d = 5) space of parameters. Say, we have decided that breaking the reasonable 
range of each parameter to N = 100 points is sufficient. (For many problems, even more points are 
necessary - see, e.g., Sec. 9.1.) Then the total number of numerical experiments to carry out is = 
(10 2 ) 5 = 10 10 - not a simple task even for the powerful modern computing facilities. (Besides the pure 
number of required CPU cycles, consider storage and analysis of the results.) For many important 
problems of nonlinear dynamics, e.g., turbulence, the parameter dimensionality d is substantially larger, 
and the computer resources necessary for one numerical experiment, are much greater. 

In the view of the curse of dimensionality concerns, approximate analytical considerations, like 
those outlined above for the subharmonic excitation, are invaluable. More generally, physics used to 
stand on two legs, experiment and (analytical) theory. The enormous progress of computer performance 
during a few last decades has provided it with one more point of support (a tail? :-) - numerical 
simulation. This does not mean we can afford to cut and throw away any of the legs we are standing on. 



4.9. Exercise problems 
4.1 . Prove Eq. (26) for the response function given by Eq. (17). 

Hint: You may like to use the Cauchy integral for analytical functions of complex variable. 37 



4.2 . A square-wave pulse of force (see Fig. on the right) is exerted on a 
linear oscillator with eigenfrequency a>o (no damping), initially at rest. 
Calculate the law of motion q(t), sketch it, and interpret the result. 



4.3 . Calculate the law of motion of a linear harmonic oscillator with 
low damping, induced by a resonant force that is suddenly turned on at t = 0: 




fit) = 



0, 

f 0 cos co 0 t, 



for t < 0, 
for^>0, 



in the first approximation in dlaxs « 1. Sketch (or plot) the resulting function q(f), and give its physical 
interpretation. Explore result's trend at J— > 0. 



4.4 . Figure below shows the initial stage of oscillations in a weakly damped linear oscillator 
(initially at rest), driven by a sinusoidal external force with a frequency 10% higher than coo, which was 
turned on at t = 0. Explain the origin of the decaying modulation of the oscillation amplitude, and 
estimate the g-factor of the oscillator. 



37 See, e.g., MA Eq. (15.2). 
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i 



q(f) o 




4.5 . For a system with the Lagrangian function 



L = — q q + sq , 



30(1 



with small a, use the rotating-wave approximation to find the frequency of free oscillations as a function 
of their amplitude. 



4.6 . Find the regions of real, time-independent parameters a\ and aj, in which the fixed point of 
the following system of equations, 

q x =a l (q 2 -q x ), 

is unstable. On the [a\, CI2] plane, sketch the regions of each fixed point type - stable and unstable nodes, 
focuses, etc. 



4.7 . Use the rotating-wave approximation to analyze forced oscillations in an oscillator with 
weak nonlinear damping, described by equation 

q + 2Sq + a>lq + J3q 3 = f 0 cos cot, 

with ca~coo;/3,d >0, /3coA 2 « 1. In particular, find the stationary amplitude of forced oscillations and 
analyze their stability. Discuss the effect(s) of the nonlinear term on the resonance. 



4.8 . Analyze stability of the forced nonlinear oscillations described by Eq. (43). Relate the result 
to the slope of resonance curves (Fig. 4). 



4.9 . Adding nonlinear term aq to the left-hand part of Eq. (76), 

(i) find the corresponding addition to the RWA equations, 

(ii) find the stationary amplitude A of parametric oscillations, 

(iii) sketch and discuss the A(^) dependence, 

(iv) find the type and stability of each fixed point of the RWA equations, 



Chapter 4 



Page 33 of 34 



Essential Graduate Physics 



CM: Classical Mechanics 



(v) sketch the Poincare phase plane of the system in main parameter regions. 

4.10 . Use the rotating-wave approximation to find the conditions of parametric excitation in an 
oscillator with weak modulation of both the effective mass m(t) = mo(l + // m cos 2 cot) and spring constant 
Kit) = k 0 [\ + /Ut£Os{2cot -i//)], with the same frequency loo « Icoq, but arbitrary modulation depths ratio 
ju m /jUk and phase shift y/. Interpret the result in terms of modulation of the instantaneous frequency co(t) = 
[K\t)lm{t)] m and impedance Z(t) = [K(t)m(t)] m of the oscillator. 

4.1 1 . Find the condition of parametric excitation of a nonlinear oscillator described by equation 

q + 2Sq + a>lq + yq 2 = f 0 cos 2cot, 
with sufficiently small 8, y, fo, and B, = co - coo. 
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Chapter 5. From Oscillations to Waves 

In this chapter, the discussion of oscillations is extended to systems with two and more degrees of 
freedom. This extension naturally leads to another key notion - waves. The discussion of waves {at this 
stage, in ID systems) is focused at such key phenomena as their dispersion and reflection from 
interfaces/boundaries. 



5.1. Two coupled oscillators 

Let us move on to discuss oscillations in systems with more than one degree of freedom, starting 
from the simplest case of two linear, dissipation-free oscillators. If the Lagrangian of the system may be 
presented as a sum of those for two harmonic oscillators, 

2 • 2 \ 2 2 

L = L 1 +L 2 , L l2 = T l2 —U l2 = — ^— q l2 f— q l2 , (5.1) 

(plus arbitrary, inconsequential constants if you like), then according to Eq. (2.19), the equations of 
motion of the oscillators are independent of each other, and each one is similar to Eq. (1.1), with its 
partial frequency Qi,2 equal to 

n?, 2 =— • (5-2) 

This means that in this simplest case, the arbitrary motion of the system is just a sum of independent 
sinusoidal oscillations at two frequencies equal to the partial frequencies (2). 

Hence, in order to describe the oscillator coupling (i.e. interaction), the full Lagrangian L should 
contain an additional mixed term L mt depending on both generalized coordinates q\ and q2 and/or 
generalized velocities. The simplest, and most frequently met type of such interaction term is the 
following bilinear form U{ n t = - Kq\q 2 , where /r is a constant, giving L mi = -Umt = Kq\q 2 . Figure 1 shows 
the simplest example of system with such interaction. 1 In it, three springs, keeping two massive 
particles between two stiff walls, have generally different spring constants. 




Fig. 5.1. The simplest system 

of two coupled harmonic oscillators. 



q x q 2 



Indeed, in this case the kinetic energy is still separable, T = T\ + Ti, but the total potential 
energy, consisting of elastic energies of three springs, is not: 

U = ^+^-(q 1 -q 2 ) 2 + '^q 2 2 , (5.3a) 



Here it is assumed that the particles are constrained to move in only one dimension (shown horizontal). 
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where q\2 are the horizontal displacements of particles from their equilibrium positions. It is convenient 
to rewrite this expression as 

U = ^-q x +^-q 2 -icq x q 2 , where k x = K L + K M , k 2 =k r +k m , k = k u , (5.3b) 

showing that the Lagrangian L = T- U of this system indeed contains a bilinear interaction term: 

L = L X +L 2 + L mt , L mt = Kq x q 2 . (5.4) 

The resulting Lagrange equations of motion are 



m l q\ + m x Q x q x = Kq 2 , 
m 2 q 2 + m 2 Q. 2 2 q 2 = Kq x . 



Linearly 
(5.5) coupled 
oscillators 



Thus the interaction energy describes effective generalized force M72 exerted on subsystem 1 by 
subsystem 2, and the reciprocal effective force Kq\. Note that in contrast to real physical forces (these 
effective forces (such as F n = - F 2 \ = KtJjq 2 - q\) for the system shown in Fig. 1) the effective forces in 
the right-hand part of Eqs. (5) do obey the 3 rd Newton law. Note also that they are proportional to the 
same coefficient /r, this feature is a result of the general bilinear structure (4) of the interaction energy 
rather than of any special symmetry. 

We already know how to solve Eqs. (5), because it is still a system of linear and homogeneous 
differential equations, so that its general solution is a sum of particular solutions of the form similar to 
Eqs. (4.88), 



c x e 



2 



(5.6) 



for all possible values of X. These values may be found by plugging Eq. (6) into Eqs. (5), and requiring 
the resulting system of two linear algebraic equations for the distribution coefficients c\^, 



m l X 2 c l +m l Q 2 c l =kc 2 , 
m 2 X 2 c 2 + m 2 Q. 2 2 c 2 = kc x , 

to be self-consistent. In our particular case, we get a characteristic equation, 



(5.7) 



m x (X 2 +Q 2 ) 



K 



- K 



m 2 (X 2 +a 2 2 ) 



= 0, 



(5.8) 



that is quadratic in X , and thus allows a simple solution: 



(x 2 ) ± =- l -(a 2 x+ n 2 2 )T 
= -I(af+^)+ 



-(n 2 +£L 2 2 f -£L 2 £L 2 2 + — 



2 K 
+ 



1/2 



1/2 



(5.9) 



According to Eqs. (2) and (3b), for any positive values of spring constants, product QiQ 2 = (kl + km){k r 



+ K M )l{m\m2) m is always larger than Klimxmi) 1 ^ = K]J(m\mi) lu so that the square root in Eq. (9) is 
always less than (Qi 2 +Q2 2 )/2. As a result, both values of X are negative, i.e. the general solution to Eq. 



All 



1/2 
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(5) is a sum of four terms, each proportional to Qxp{±ico+t}, where both eigenfrequencies co± = iA+ are 
real: 



Anticrossing 
description 



co 2 ± =-4 =-(p? +n 2 2 )± 






1/2 










m l m 2 





(5.10) 



A plot of these eigenfrequencies as a function of one of the partial frequencies Q (say, Qi), with 
the other partial frequency fixed, gives the famous anticrossing diagram (Fig. 2). One can see that at 
weak coupling, frequencies co± are close to the partial frequencies everywhere besides a narrow range 
near the anticrossing point Qi = Q2. Most remarkably, at passing through this region, co+ smoothly 
"switches" from following Q2 to following Qi and vice versa. 




Fig. 5.2. Anticrossing diagram for two values of 
the oscillator coupling strength idimimz) 1 0.2"- 
0.3 (red lines) and 0.1 (blue lines). In this plot, 
Qi is assumed to be changed by varying k x 
rather than m.\, but in the opposite case the 
diagram is qualitatively similar. 



The reason for this counterintuitive behavior may be found by examining the distribution 
coefficients ci j2 corresponding to each branch of the diagram, which may be obtained by plugging the 
corresponding value of A± = -ico+ back into Eqs. (7). For example, at the anticrossing point Qi = Q2 = Q 
Eq. (10) is reduced to 



col 



Q 2 ±- 



K 



1/2 



1±- 



K 



( k i k 2 T 2 



(5.11) 



Plugging this expression back into any of Eqs. (7), we see that for the two branches of the anticrossing 
diagram, the distribution coefficient ratio is the same by magnitude but opposite by sign: 2 



f \ 




f \ 






m 2 




= + 




V C 2 J 


± 





1/2 



at Qj = Q 2 



(5.12) 



In particular, if the system is symmetric (mi = m 2 , k l = Kr), then at the upper branch, corresponding to 
co+ > co- t C\ = - C2. This means that in this hard mode? masses oscillate in anti-phase: q\(f) = -qi(t). The 



2 It is useful to rewrite Eq. (12) as Z\C\ = +Z2C2, where = ( K\^m\ y 2) are of the partial oscillator impedances - 
the notion already mentioned in Chapter 4, and to be discussed in more detail in Sec. 4 below. 
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resulting substantial extension/compression of the middle spring yields additional returning force which 
increases the oscillation frequency. On the contrary, on the lower branch, corresponding to a>., the 
particle oscillations are in phase: C\ = C2, q\{f) = qi{f), so that the middle spring is never stretched at all. 
As a result, the soft mode oscillation frequency co. is lower than co+ and does not depend on ic 



^ = & -- = ^ = ^ . (5.13) 
m m m 

Note that for both modes, the oscillations equally engage both particles. 

Far from the anticrossing point, the situation is completely different. Indeed, an absolutely 
similar calculation of ci,2 shows that on each branch of the diagram, one of the distribution coefficients 
is much larger (by magnitude) than its counterpart. Hence, in this limit any particular mode of 
oscillations involves virtually only one particle. A slow change of system parameters, bringing it 
through the anticrossing, results, first, in a maximal derealization of each mode, and then in the 
restoration of the localization, but in a different partial degree of freedom. 

We could readily carry out similar calculations for the case when the systems are coupled via 
their velocities, L- mt = mq x q 2 , where m is a coupling coefficient - not necessarily a certain physical mass. 

(In mechanics, with q^ 2 standing for actual particle displacements, such coupling is hard to implement, 
but there are many dynamic systems of non-mechanical nature in which such coupling is the most 
natural one.) The results are generally similar to those discussed above, again with the maximum level 
splitting at Qi = Q2 = 



1 + \m\/(m l m 2 ) V2 ^ (m { m 2 ) 



1± 



\m\ 



1/2 

J 



(5.14) 



the last relation being valid for weak coupling. The generalization to the case of both coordinate and 
velocity coupling is also straightforward - see the next section. 

The anticrossing diagram shown in Fig. 2 may be met not only in classical mechanics. It is even 
more ubiquitous quantum mechanics, because, due to the time-oscillatory character of the Schrodinger 
equation solutions, weak coupling of any two quantum states leads to a qualitatively similar behavior of 
eigenfrequencies a>+ and hence of the eigenenergies ("energy levels") E± = hco±. 4 



5.2. N coupled oscillators 

The calculations of the previous section may be readily generalized to the case of arbitrary 
number (say, AO coupled harmonic oscillators, with arbitrary type of coupling. It is evident that in this 
case Eq. (4) should be replaced with 



3 In physics, term "mode" is typically used for a particular type of variable distribution in space (in our current 
case, a certain set of distribution coefficients C\j), that sustains oscillations at a single frequency. 

4 One more property of weakly coupled oscillators, a periodic slow transfer of energy from one oscillator to the 
other and back, is more important for quantum rather than for classical mechanics. This is why I refer the reader 
to QM Sees. 2.5 and 5.1 for a detailed discussion of this phenomenon. 
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7=1 7,7=1 

Moreover, we can generalize the above relations for the mixed terms Ljy, taking into account their 
possible dependence not only on the generalized coordinates, but on the generalized velocities, in a 
bilinear form similar to Eq. (4). The resulting Lagrangian may be presented in a compact form, 



7,7=1 



—q J q r -—q J q J 

V z z J 



(5.16) 



where the off-diagonal terms are index-symmetric: = myj, Kjj- = Kyj, and the factors Vi compensate the 
double counting of each term with j ^ j ', taking place at the summation over two independently running 
indices. One may argue that Eq. (16) is quite general is we still want the equations of motion to be linear 
- as they have to be if the oscillations are small enough. 

Plugging Eq. (16) into the general form (2.19) of the Lagrange equation, we get yV equations of 
motion of the system, one for each value of index j — 1,2,..., N: 

Xk,«y, i*v<y,)-o. (5.17) 

Just as in the previous section, let us look for a particular solution to this system in the form 

q j =c j e At . (5.18) 
As a result, we are getting a system of N linear, homogeneous algebraic equations, 

tkA 2+ ^rh=^ (5-19) 

7=1 

for the set of N distribution coefficients Cj. The condition that this system is self-consistent is that the 
determinant of its matrix equals zero: 



Det(m, y ,/l 2 +k>)=0. (5.20) 



This characteristic equation is an algebraic equation of degree N for X , and so has N roots (X )„. For any 
Hamiltonian system with stable equilibrium, matrices m#- and iqy ensure that all these roots are real and 
negative. As a result, the general solution to Eq. (17) is the sum of 2N terms proportional to exp {±ico n t}, 
n= \,2,...,N, where all TV eigenfrequencies co„ are real. 

Plugging each of these 2N values of X = ±ia>„ back into the set of linear equations (17), one can 
find the corresponding set of distribution coefficients Cj±. Generally, the coefficients are complex, but in 
order to keep q0) real, the coefficients Cj+ corresponding to X = +ico n and Cj. corresponding to X = -ico n 
have to be complex conjugate of each other. Since the sets of the distribution coefficients may be 
different for each X n , they should be marked with two indices,y' and n. Thus, at general initial conditions, 
the time evolution of y'-th coordinate may be presented as 

1 N I * \ N 

1j = o Z\ c 7« exp{+iG>„f} + c jn exp{-ico n t})=Re^c Jn exp{ico n t} . (5.21) 

Z «=1 H=l 
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This formula shows very clearly again the physical sense of the distribution coefficients c jn : a set 
of these coefficients, with different values of index j but the same n, gives the complex amplitudes of 
oscillations of the coordinates for the special choice of initial conditions, that ensures purely sinusoidal 
motion of the system, with frequency co n . Moreover, these coefficients show how exactly such special 
initial conditions should be selected - within a common constant factor. 

Calculation of the eigenfrequencies and distribution coefficients of a coupled system with many 
degrees of freedom from Eq. (20) is a task that frequently may be only done numerically. 5 Let us discuss 
just two particular but very important cases. First, let all the coupling coefficients be small (|m#' | « ntj 
= ntjj and 1« Kj = Kjj, for all j ^j), and all partial frequencies Q 7 = (Kjlmj) be not too close to each 
other: 

OS, \k„\ in A 
1 J — 2 ' for all./*/'. (5.22) 

(Such situation frequently happens if parameters of the system are "random" in the sense that they do 
not follow any special, simple rule.) Results of the previous section imply that in this case the coupling 
does not produce a noticeable change of oscillation frequencies: {a> n } « {Q}- In this situation, 
oscillations at each eigenfrequency are heavily concentrated in one degree of freedom, i.e. in each set of 
the distribution coefficients c jn (for a given n), one coefficient's magnitude is much larger than all 
others. 

Now let the conditions (22) be valid for all but one pair of partial frequencies, say Qi and Q2, 
while these two frequencies are so close that coupling of the corresponding partial oscillators becomes 
essential. In this case the approximation {a>„} « {Q,} is still valid for all other degrees of freedom, and 
the corresponding terms may be neglected in Eqs. (19) for 7 = 1 and 2. As a result, we return to Eqs. (7) 
(perhaps generalized for velocity coupling) and hence to the anticrossing diagram (Fig. 2) discussed in 
the previous section. As a result, an extended change of only one partial frequency (say, Qi) of a 
weakly coupled system produces a series of eigenfrequency anticrossings - see Fig. 3. 




Fig. 5.3. Level anticrossing in a system of N 
weakly coupled oscillators - schematically. 



5 Fortunately, very effective algorithms have been developed for this matrix diagonalization task - see, e.g., 
references in MA Sec. 16(iii)-(iv). For example, the popular MATLAB package was initially created for this 
purpose. ("MAT" in its name stands for "matrix" rather than "mathematics".) 
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5.3. ID waves in periodic systems 

For coupled systems with considerable degree of symmetry, the general results of the last section 
may be simplified, some with very profound implications. Perhaps the most important of them are 
waves. Figure 4 shows a classical example of a wave-supporting system - a long ID chain of massive 
particles, with the elastic next-neighbor coupling. 



K 



m 



K 



m 



K 



m 



K 



o i — > Vj-i o i — > a j o i — > q 



7+1 



~l 1 — 

jd (j + \)d 



Fig. 5.4. Uniform ID chain of 
elastically coupled particles. 



Let us start from the case when the system is so long (formally, infinite) that the boundary 
effects may be neglected; then its Lagrangian may be represented by an infinite sum of similar terms, 
each including the kinetic energy of y'-th particle, and the potential energy of the spring on one (say, 
right) side of it: 



L = Z, y<7, -yl<7,+i-<7 7 ] 



(5.23) 



From here, the Lagrange equations of motion (2.19) have the same form for each particle: 

m } - K (q j+ x + - ) = 0 • ( 5 - 24 ) 

Apart from the (formally) infinite size of the system, this is evidently just a particular case of Eq. (17), 
and thus its particular solution may be looked in the form (18), with A 2 — » -co 2 < 0. With this 
substitution, Eq. (24) gives the following simple form of the general system (17) for the distribution 
coefficients c/. 

{-mco 2 + 2k)cj - kc /+1 - KCj_ x = 0 . (5.25) 

Now comes the most important conceptual step toward the wave theory: the translational 
symmetry of Eq. (23), i.e. its invariance to the replacement j — > j + 1, allows it to have a particular 
solution of the following form: 

c j = ae aj , (5.26) 

where coefficient a may depend on a> (and system's parameters), but not on the particle number j. 
Indeed, plugging Eq. (26) into Eq. (25) and cancelling the common factor e' aj , we see that it is 
identically satisfied, if a obeys the following algebraic equation: 

(- mco 1 + 2/c)-/ce + ia - xe~ ia = 0 . (5.27) 
The physical sense of solution (26) becomes clear if we use it and Eq. (18) with X = +ico to write 
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9j(t) = Re 


i(kz + at) 
ae 1 


= Re 




5 



(5.28) 



where wave number k is defined as k = aid, and Zj=jd is the equilibrium position ofy-th particle - the 
notion that should not be confused with particle's displacement qj from that equilibrium position - see 
Fig. 4. Relation (28) describes nothing else than a sinusoidal traveling wave of particle displacements 
(and hence of spring extensions/constrictions), that propagates, depending on the sign before v ph , to the 
right or to the left along the particle chain with phase velocity 



CO 



(5.29) 



Perhaps the most important characteristic of a wave is the so-called dispersion relation, i.e. the 
relation between its frequency co and wave number k - essentially between the temporal and spatial 
frequencies of the wave. For our current system, this relation is given by Eq. (27) with a = kd. Taking 
into account that (2 - e +ia - e~ ia ) = 2(1 - cosa) = 4sin 2 (a/2), it may be rewritten in a simpler form: 



co ■ 



±g)q shi- 



er 



±co 0 sin - 



kd 



where co n 



1/2 



This result, frequently called the Debye dispersion relation, 6 is sketched in Fig. 5, 
remarkable in several aspects. 



1D 

traveling 
wave 



Phase 
velocity 



(5.30) 
and is rather 




Fig. 5.5. The Debye 
dispersion relation. 



First, if the wavelength X = 2id\k\ is much larger than the spatial period a of the structure, i.e. if 
\kd\ « 1 (so that | co\ « coo), the dispersion relation is approximately linear: 

kd 
~2 



CO = ±co n 



= ±vk, 



(5.31) 



where parameter v is frequency-independent: 



v = 



co 0 d 



f V /2 



d. 



(5.32) 



Comparison of Eq. (31) with Eq. (28) shows that this constant plays, in the low-frequency region, the 
role of phase velocity for any frequency component of a waveform created in the system - say, by initial 
conditions. As a result, low-frequency waves of arbitrary form can propagate in the system without 



6 Named after P. Debye who developed this theory in 1912, in the context of specific heat of solids at low 
temperatures (beating nobody else than A. Einstein on the way :-) - see, e.g., SM Sec. 2.6. 
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deformation (called dispersion). Such waves are called acoustic, 1 and are the general property of any 
elastic continuous medium. 

Indeed, the limit \kd\ « 1 means that distance d between adjacent particles is much smaller than 
wavelength X = 2nl\k\, i.e. that the differences qj+i(t) - qj{f) and qj{f) - qj-\{f), participating in Eq. (24), are 
relatively small and may be approximated with dq/dj = dq/d(z/d) = d(dq/dz), with the derivatives taken 
at middle points between the particles: respectively, z+ = (zj+\ - zj)/2 and z.= {zj - z,-i)/2. Here z is now 
considered as a continuous argument (and hence the system, as a ID continuum), and q{z,t), as a 
continuous function of space and time. In this approximation, the sum of the last two terms of Eq. (24) is 
equal to -Kd[dq/dz(z+)-dq/dz(z.)], and may be similarly approximated by -icd 2 (d 2 q/dz 2 ), with the second 
derivative taken at point (z+ - z.)/2 = zj, i.e. exactly at the same point as the time derivative. As the result, 
the ordinary differential equation (24) is reduced to a partial differential equation 



1 D wave 
equation 



77? 



d 2 q 
~8t 2 ~ 



Kd 



i d 2 q 
dz 2 



o. 



Using Eqs. (30) and (32), we may present this equation in a more general form 



v 2 dt 2 



8 



2 \ 



dz^ 



q(z,t) = 0., 



(5.33a) 



(5.33b) 



which describes a scalar acoustic wave (of any physical nature) in a ID linear, dispersion- free 
continuum - cf. Eq. (1.2). In our current simple model (Fig. 4), direction z of the wave propagation 
coincides with the direction of particle displacements q; such acoustic waves are called longitudinal. 
However, in Chapter 7 we will see that 3D elastic media may also support different, transverse waves 
that also obey Eq. (33b), but with a different acoustic velocity v. 

Second, when the wavelength is comparable with the structure period d (i.e. the product kd is not 
small), the dispersion relation is not linear, and the system is dispersive. This means that as a wave, 
whose Fourier spectrum has several essential components with frequencies of the order of coo, travels 
along the structure, its waveform (which may be defined as the shape of a snapshot of all qj, at the same 
time) changes. 8 This effect may be analyzed by presenting the general solution of Eq. (24) as the sum 
(more generally, an integral) of components (28) with different complex amplitudes a: 



1 D wave 
packet 



+ r i\kz , 
qj (t) = Re\a k e L J 



co{k)t\ 



dk . 



(5.34) 



This notation emphasizes the dependence of the partial wave amplitudes at and frequencies on 
the wave number k. While the latter dependence is given by the dispersion relation, in our current case 
by Eq. (30), function at is determined by the initial conditions. For applications, the case when au is 
substantially different from zero only is a narrow interval, of width Ak « ko around some central value 
/Co, is of special importance. (The Fourier transform reciprocal to Eq. (34) shows that this is true, in 
particular for a so-called wave packet - a sinusoidal wave modulated by an envelope with a large width 



7 This term is purely historical. Though the usual sound waves in air belong to this class, the waves we 
are discussing may have frequency both well below and well above human ear's sensitivity range. 

8 The waveform deformation due to dispersion (which we are considering now) should be clearly distinguished 
from its possible change due to attenuation, i.e. energy loss - which is not taken into account is our energy- 
conserving model (23) - cf. Sec. 5 below. 
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Az ~ 1/AA: » Vko - see Fig. 6.) Using that strong inequality, the wave packet propagation may be 
analyzed by expending the dispersion relation co(k) into the Taylor series at point ko, and, in the first 
approximation in Ak/ ko, restricting the expansion by its first two terms: 

co(k) « a> 0 + , k , where co Q = co(k 0 ), and k = k-k 0 . (5.35) 
dk o 



In this approximation, Eq. (34) yields 



qj (t) ~ Re | a k exp \ i (k 0 + k fZj 

I - 



co 0 + 



dco 
dk 



= Re 



Qxp{i(k 0 Zj - a) 0 tfj ^a k exp< ik 



dco 
dk 



>dk 

\ 



>dk 



(5.36) 



Comparing this expression with the initial form of the wave packet, 



ikz f 
^(0) = Re ^a k e ' dk = Re exp{z'£ 0 z y }^a k expj; 



WZj \dk 



(5.37) 



and taking into account that the phase factors before the integrals in the last forms of Eqs. (36) and (37) 
do not affect its envelope, we see that in this approximation 9 the envelope sustains its initial form and 
propagates along the system with the so-called group velocity 



dco 




Vgr= ^k 





Group 

(5.38) velocity 



Note that, with the exception of the acoustic wave limit (31), this velocity (that characterizes the 
propagation of waveform's envelope), is different from the phase velocity (28) that describes the 
propagation of the "carrier" sine wave - for example, one of its zeros - see Fig. 6. 




Fig. 5.6. Phase and group 
velocities of a wave packet. 



Next, at the Debye dispersion law (30), the difference between v p h and v gr increases as the 
average frequency co approaches «o, with the group velocity tending to zero, while the phase velocity 
staying virtually constant. The existence of such a maximum for the wave propagation frequency 



9 Taking into account the next term in the Taylor expansion of function a{q), proportional to d 2 coldq 2 , we would 
find that actually the dispersion leads to a gradual change of the envelope form. Such changes play an important 
role in quantum mechanics, so that I discuss them in that part of my notes (see 
QM Sec. 2.1). 
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presents one more remarkable feature of this system. It physics may be readily understood by noticing 
that according to Eq. (30), at co = coq, the wave number k equals nnld, where n is an odd integer, and 
hence the phase shift a = kd is an odd multiple of n. Plugging this value into Eq. (28), we see that at the 
Debye frequency, oscillations of two adjacent particles are in anti-phase, for example: 

q Q (t) = acxp{-icot}, q x {t) = aexp{i(7r - cot)} = -acxp{-icot} = -q 0 (t). (5.39) 

It is clear from Fig. 4 that at such phase shift, all the springs are maximally stretched/compressed (just as 
in the hard mode of the two coupled oscillators analyzed in Sec. 1), so that it is natural that this mode 
has the highest frequency. 

This invites a natural question what happens with the system if it is excited at a frequency co > 
coo, say by an external force applied at the system's boundary. While the boundary phenomena will be 
considered in the next section, the most essential part of the answer may be obtained immediately from 
Eqs. (26) and (30). Indeed, reviewing the calculations that have led to these results, we see that they are 
valid not only for real but also any complex values of a. In particular, at co > coo the dispersion relation 
(30) gives 

a = ri7r±i — , where A = —, r. (5.40) 

A 2arccosh(<y/ co 0 ) 

Plugging this relation into Eq. (26), we see that the wave's amplitude becomes an exponential function 
of position: 

II I I ± jlma ±Zj / A 

\qj\ = \a\e J oc e 7 . (5.41) 

Physically this means that the wave decays penetrating into the structure (from the excitation point), 
dropping by a factor of e « 3 on the so-called penetration depth A. (According to Eq. (40), this depth 
decreases with frequency, but rather slowly, always remaining of the order of the distance between the 
adjacent particles.) Such a limited penetration is a very common property of various waves, including 
the electromagnetic waves in plasmas and superconductors, and quantum-mechanical "de Broglie 
waves" (wavefunctions) in the classically-forbidden regions. Note that this effect of "wave expulsion" 
from the media they cannot propagate in does not require any energy dissipation. 

Finally, one more fascinating feature of the dispersion relation (30) is that if it is satisfied by 
some wave number A: 0 (co), it is also satisfied at any k n {co) = k 0 (co) + 2ml d, where n is any integer. This 
property is independent of the particular dynamics of the system: it follows already from Eq. (27), 
before its substitution into Eq. (25), because such a wave number translation by 2nld, i.e. the addition of 
2tz to phase shift a, is equivalent to the multiplication of qj(t) by exp{z2;r} = 1. Thus, such (2nld)- 
periodicity in the wave number space is a common property of all systems that are ^-periodic in the 
usual ("direct") space. 10 

Besides dispersion, one more key characteristic of any wave-supporting system is its wave 
impedance - the notion strangely missing from many physics (but not engineering) textbooks. It may be 



10 This property has especially important implications for quantum properties of periodic structures, e.g., crystals. 
It means, in particular, that the product fik cannot present the actual momentum of the particle (which is not 
conserved in periodic systems), but rather serves as its quasi-momentum (or "crystal momentum") - see, e.g., QM 
Sec. 2.5. 
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revealed by calculating the forces in the sinusoidal wave (28). For example, the force exerted by j-th 
particle on its right neighbor, given by the second term in Eq. (24), equals 



F J+ (t) = f c[q J (t)-q j+l (t)]=Re 



K\\-e \ae 1 



Re 



i(kz . + cot) 

ikdme 



(5.42) 



where the last form is valid in the most important acoustic wave limit, kd — » 0. Let us compare this 
expression for the wave of forces with that for the corresponding wave of particle velocities: 



q,.(t) = Re 



+ icoae 



i(a j + cot) 



(5.43) 



We see that these to waves have the same phase, and hence their ratio does not depend on either time or 
the particle number. Moreover, this ratio, 



F + _ + kdrc _ + ch^ _ +z 
q co v 



(5.44) 




is a real constant independent even on wave's frequency. Its magnitude is called the wave impedance: 



(5.45) 



and characterizes the dynamic "stiffness" of the system for the propagating waves. 

In particular, the impedance scales the power carried by the wave. Indeed, the direct time 
averaging of the instantaneous power 7j (t) = Fj{t)dqjldt transferred through particle j to the subsystem 
on the right of it, using Eqs. (42)-(43), yields a position-independent result 



Wave 
impedance 



Traveling 
(5.46) wave ' s 



where A = \a\ is the real amplitude of the wave, and, as before, the positive sign corresponds to the wave 
propagating to the right (and vice versa). Note that ~P is the power flow in the acoustic wave, and its 
spatial and temporal independence means that wave's energy is conserved - as could be expected from 
our Hamiltonian system we are considering. 11 Hence, the wave impedance Z characterizes the energy 
transfer along the system rather than its dissipation. 



5.4. Interfaces and boundaries 

The importance of the wave impedance notion becomes even more evident when we consider 
waves in non-uniform and finite-size systems. Indeed, our previous analysis assumed that the ID system 
supporting the waves (Fig. 4) is exactly periodic, i.e. macroscopically uniform, and extends all the way 
from -oo to +oo. Now let us examine what happens when this is not true. The simplest (and very 
important) example of such nonuniform systems is an interface, i.e. a point at which system parameters 
experience a change. Figure 7 shows a simple and representative example of such a sharp interface, for 
the same ID wave system that was analyzed in the last section. 



The direct calculation of the energy (per unit length) is a simple but useful exercise, left for the reader. 
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Since the parameters k and m are still constant on each side of the interface (put, for 
convenience, at zj = 0), equations of motion (24) are is still valid for j < 0 and j > 0 (in the latter case, 
with the primed parameters), and show that at a fixed frequency a>, they can sustain sinusoidal waves of 
the type (28). However, the final jump of parameters at the interface (m ' ^ m, k ' ^ k) leads to a partial 
reflection of the incident wave from the interface, so that at least on the side of incidence (say, zj < 0), 
we need to assume two such waves, one describing the incident wave and another, the reflected wave: 



?,(0 = Re 



i\ kz , - cut] i\ -kz . - cot) 

a^e K J +a^e K \ fory<0, 

4 &'z , -or 

a'^e 1 7 \ fory>0. 



In order to obtain boundary conditions for "stitching" these waves (i.e. getting relations between 
their complex amplitudes) at j = 0, i.e. z, = 0, we need to take into account, first, that displacement q 0 (t) 
of the interface particle has to be the same whether it is considered a part of the left or right sub-system, 
and hence participates in Eqs. (24) for both j < 0 and j > 0. This gives us the first boundary condition, 

a^+a^=a\. (5.48) 

Second, writing the equation of motion for the special particle with j = 0, 

m 0 q 0 - n J (q l -q 0 ) + x(q 0 -q_ 1 ) = 0. (5.49) 



and plugging into it the solution (47), we getthe second boundary condition 



cv 2 m a a\ -ida' J e ik ' d - 1 I f k 



\- e - M \ + * \\-e ikd 



= 0. (5.50) 



The system of two linear equations (48) and (50) allows one to express both a<_ and a % via 
amplitude of the incident wave, and hence find the reflection (R) and transmission ( T) coefficients 
of the interface: 12 



m a' 



R = ^, T = ^. (5.51) 



a 



The general result for R and T is a bit bulky, but may be simplified in the most important acoustic wave 
limit: k'd, kd — > 0. Indeed, in this limit all three parentheses participating in Eq. (50) may be 
approximated by the first terms of their Taylor expansions, e.g., exp{ik'd} - 1 « ik'd, etc. Moreover, in 
this limit, the first term of Eq. (50) is of the second order in small parameter colco® ~ ka « 1 (unless the 



12 Sorry, one more traditional usage of letter T. I do not think there any chance to confuse it with the kinetic 
energy. 
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interface particle mass mo is much larger than both m and m '), and hence may be neglected. As a result, 
Eq. (50) takes a very simple form 13 

xk(a^ -a < _) = K'k'a'^. (5.52a) 

According to Eqs. (31), (32) and (45), in the acoustic limit the ratio of factors /ck of the waves (with the 
same frequency col) propagating at z < 0 and z > 0 is equal to that of the wave impedances Z of the 
corresponding parts of the system, so that Eq. (52a) may be rewritten as 

Z(a^-a^_) = Z'a\. (5.52b) 
Now, solving the simple system of linear equations (48) and (52a), we get very important formulas, 

(5.53) 





2Z 




T = , 


Z + Z' 


Z + Z' 



which are valid for any waves in ID continua - with the corresponding re-definition of impedance. 14 
Note that coefficients R and T characterize the ratios of wave amplitudes rather than their power. Using 
Eq. (46), for the time-averaged power flows we get relations 

: P' A//' 

(5.54) 



■P 
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ae 



i\ kz , 



- cot 



ae 



kZ ; 



cot 



= Re 



2iae lC0t smkz . 



2Asm{cot -cp )sinfe y ., (5.55) 



where a = a^, = A e' 9 . At the boundary (z 7 - = 0) this expression yields qo(t) = 0, i.e., a node of particle 
displacements. On the contrary, the corresponding standing wave of spring forces, described by Eq. 
(42), has a maximum at z = 0. 



Reflection 
and 

transmission 
coefficients 



{Z + Z'J 7% (Z + Z'f 

(Note that ^ + P'^. = /P+, again reflecting the energy conservation.) 

The first important result of this calculation that wave is fully transmitted through the interface if 
the so-called impedance matching condition Z' = Z is satisfied, even if the wave velocities v (32) are 
different on the left and the right sides of the interface. On the contrary, the equality of the acoustic 
velocities in two media does not guarantee the full transmission of their interface. Again, this is a very 
general result. 

Now let us consider the two limits in which Eq. (53) predicts a total wave reflection, P^I^L, — > 
0: Z'/Z — > oo (when R = - 1) and Z'/Z — > 0 (when R = 1). According to Eq. (45), the former limit 
corresponds to the infinite product k 'm ', so that particles on the right side of the interface cannot move 
at all. This means that this particular case also describes a perfectly rigid boundary (Fig. 8a) for arbitrary 
co, i.e. not necessarily in the acoustic wave limit. The negative sign ofi? in the relation R = -1 means that 
in the reflected wave, the phase of particle oscillations is shifted by n relative to the initial wave, a = a+- 
= -a_>, so that the sum of these two traveling waves may be also viewed as a single standing wave 



13 This equation could be also obtained using Eq. (42), as the condition of balance of the forces exerted on the 
interface particle with 7 = 0 from the left and right - again, neglecting the inertia of that particle. 

14 See, e.g., corresponding parts of my lecture notes: QM Sec. 2.3 and EM Sec. 7.4. In 2D and 3D systems, Eqs. 
(53) are valid for the normal wave incidence only, otherwise they have to be modified - see, e.g., EM Sec. 7.4. 
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A similar standing wave forms in the opposite limit Z'/Z — > 0, that describes an "open" boundary 
shown in Fig. 8b. However, in this limit (with R = + 1), the standing wave of displacements has a 
maximum at zj = 0, 

= 2Acos(cot -(p) cos kz r (5.56) 

while the corresponding wave of forces has a node at that point. Most importantly, for both boundaries 
shown in Fig. 8, the standing waves are formed at any ratio ca/coo. 



tf;<o(0 = Re 



ae 



kz j 



cot 
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+ ae 



kz j 



cot 



= Re 



2ae ' 0)1 cos kz , 



1 



(a) 



(b) 



Fig. 5.8. (a) Rigid and (b) open 
boundaries of a ID chain. 



If the opposite boundary of a finite-length chain also provides a total wave reflection, the system 
may only support standing waves with certain wave numbers k n , and hence certain eigenfrequencies co n 
that may be found from the set of k n and the dispersion relation co n = oik n ), in our case given by Eq. 
(30). For example, if both boundaries of a chain with length L are rigid (Fig. 8a), then the standing wave 
(54) should have nodes at them both, giving the wave number quantization condition 15 

smk n L = §, i.e.k n = — , (5.57a) 

L 

where n is an integer. In order to count the number of different modes in a chain with a finite number N 
of oscillating particles, let us take into account, first, that adding one period Ak = 2nld of the dispersion 
relation to any k„ leads to the same mode. Moreover, changing the sign of k„ in standing wave (55) is 
equivalent to changing the sign of its amplitude. Hence, there are only N different modes, for example 
with 

n = \,2,.,N, i. e . k n =j,2j,-,Nj. (5.57b) 

This fact is of course just a particular case of the general result obtained in Sec. 2. 

According to Eq. (56), if both boundaries are open (Fig. 8b), the oscillation modes are different, 
but their wave numbers form the same set (57). Finally, if the types of boundary conditions on the 
chain's ends are opposite, the wave number set is somewhat different, 



k -- 

k "~L 



n + - 

v 



(5.58) 



15 This result should be very familiar to the reader from freshmen-level "guitar string"-type problems. 
Note, however, that Eqs. (54)-(56) are valid not only for continuous ID systems like a string, but also 
for (uniform) chains with a finite and arbitrary number /V of particles - the fact we will use below. 
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but since the distance between the adjacent values of k„ is still the same (KlNd), the system still has 
exactly N such values within each period 2 Kid of the dispersion law, and hence, again, exactly N 
different oscillation modes. 

This insensitivity of the number of modes and their equal spacing (called equidistance) on the k 
axis, enables the following useful (and very popular) trick. In many applications, it is preferable to 
speak about the number of different traveling, rather than standing waves in a system of a large but 
finite size, with coordinates zn and zn describing the same particle. One can plausibly argue that the local 
dynamics of the chain of N » 1 particles should not be affected if it is gradually bent into a large closed 
loop of length L = Nd » d. Such a loop may sustain traveling waves, if they satisfy the following 
periodic Born-Karman condition: q 0 (f) = qx(f). (A popular vivid image is that the wave "catches its own 
tail with its teeth".) According to Eq. (27), this condition is equivalent to 



ik L 1 . 2k 

e " =1, i.e. k„ = — n. 

" L 



(5.59) 



This equation gives a set of wave numbers twice more sparse than that described by Eqs. (57). 
However, now we can use N values of n, giving k n , for example, from -N to +N (strictly speaking, 
excluding one of the boundary values to avoid double counting of the identical modes with n = ±N), 
because traveling waves (28) with equal but opposite values of k n propagate in opposite directions and 
hence present different modes. As a result, the total number of different traveling-wave modes is the 
same (N) as that of different standing-wave modes, and they are similarly (uniformly) distributed along 
the wave number axis. Since for N » 1 the exact values of k n are not important, the Born-Carman 
boundary conditions and the resulting set (59) of wave numbers are frequently used even for multi- 
dimensional systems whose bending into a ring along each axis is hardly physically plausible. 



Possible 

traveling 

wave 

number 

values 



5.5. Dissipative, parametric, and nonlinear phenomena 

In conclusion, let us discuss more complex effects in oscillatory systems with more than one 
degree of freedom. Starting from linear systems, energy dissipation may be readily introduced, just as 
for a single oscillator, by adding terms proportional to 7,9/, to the equations of motion such as Eqs. (5), 

(17), or (24). In arbitrary case, viscosity coefficients rjj are different for different particles; however, in 
many uniform systems like that shown in Fig. 4, the coefficients are naturally equal, turning Eq. (24) 
into 



mqj + rjqj - /c(q j+l -q } ) + tc(qj - q M ) = 0 



(5.60) 



In the most important limit of acoustic waves, we may now repeat the arguments that have led to the 
wave equation (33) to get its generalization 




Dissipative 
(5.61) wave 

equation 



Such dissipative equation may describe two major particular effects. First, it describes the decay 
in time of the standing waves in an autonomous wave system (say, of a finite length L) that have been 
caused by some initial push, described by non-trivial initial conditions, say, q(z,0) * 0. In order to 
analyze these decaying oscillations, one may look for the solution of Eq. (61) in the form of a sum of 
standing wave modes (that satisfy the given boundary conditions), each with its own, time-dependent 
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amplitude A n (t). For example, for rigid boundary conditions (q = 0) at z = 0 and z = L, we can use Eq. 
(55) as a hint to write 



JV 

q(z,t) = Y J A n (t)^k n z , 



n=\ 



where the set of q n is given by Eq. (57). Plugging this solution into Eq. (61), 16 we get 

-tX (\ + 2 <H, + o) 2 n A n )sin k n z = 0, with k n = y n. 



(5.62) 



(5.63) 



n=\ 



Since functions sin k n z are mutually orthogonal, Eq. (63) may be only satisfied if all TV expressions in 
parentheses are equal to zero. As the result, the amplitude of each mode satisfies an ordinary differential 
equation absolutely similar to that studied in Sec. 4.1, with a similar solution describing the free 
oscillation decay with the relaxation constant (4.23). Here the wave character of the system gives 
nothing new here, besides the fact that different modes have different g-factors: Q n = co„/2S. 

More wave-specific is a different situation when the waves are due to their persisting excitation 
by some actuator at one of the ends (say, z = 0) of a very long structure. In this case, an initial transient 
process settles to a wave with a time-independent waveform limited by certain envelope A(z) that decays 
at z — > oo. In order to find the envelope, for the simplest case of sinusoidal excitation of frequency co, one 
may look for a particular solution to Eq. (61) in a form very different from Eq. (60): 



q(z,t) = Re 



a(z)e 



icot 



(5.64) 



generally with complex a(z). Plugging this solution into Eq. (61), we see that this is indeed a valid 
solution, provided that q(0,f) = a(0)cxp{-icot} satisfies the boundary condition (now describing the wave 
excitation), and a{z) obeys an following ordinary differential equation that describes wave's evolution in 
space rather than in time: 17 



dz' 



■ + k' 



a = 0, with A: 2 



f \ 2 
' co N 



\v J 



+ 2i 



Sco 



The general solution to such differential equation is 

a(x) 

with k now having both real and imaginary parts, k = k' + ik", so that the wave (64) is 



ikz , — ikz 
a x e +a_e , 



, xN i(k' z - cot) - k" z i(-k' z - cot) k" z 

q(z,t) = a + e e +a_e e 



(5.65) 



(5.66) 



(5.67) 



If our boundary conditions correspond to the wave propagating to the right, we have to keep only the 
first term of this expression, with positive k". The first exponent of that term describes the wave 
propagating from the boundary into the system (at low damping, with velocity virtually equal to v), 
while the second exponent describes an exponential decay of the wave's amplitude in space: 



16 Actually, this result may be also obtained from Eq. (60) and hence is valid for an arbitrary ratio cdJcoq. 

17 Equation (65), as well as its multi-dimensional generalizations, is frequently called the Helmholtz equation, 
named after H. von Helmholtz (1821-1894). 
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Wave 
(5.68) attenuation 



where the last, approximate relation is valid in the weak damping limit (S« a>, i.e. S/v « k'). Constant 
a is called the attenuation coefficient, and in more general wave systems may depend on frequency co. 
Physically, 21a is the scale of wave penetration into a dissipative system. 18 Note that our simple solution 
(68) is only valid if the system length L is much larger than 21 a; otherwise we would need to use the 
second term in Eq. (67) to describe wave reflection from the second end. 

Now let me discuss (because of the lack of time, on a semi-quantitative level only), nonlinear 
and parametric phenomena in oscillatory systems with more than one degree of freedom. One important 
new effect here is the mutual phase locking of (two or more) weakly coupled self-excited oscillators 
with close frequencies: if the eigenfrequencies of the oscillators are sufficiently close, their oscillation 
frequencies "stick together" to become exactly equal. Though its dynamics of this process is very close 
to that of the phase locking of a single oscillator by external signal, that was discussed in Sec. 4.4, it is 
rather counter-intuitive in the view of the results of Sec. 1, and in particular the anticrossing diagram 
shown in Fig. 2. The analysis of the effect using the rotating- wave approximation (that is highly 
recommend to the reader) shows that the origin of the difference is oscillator's nonlinearity, which 
makes oscillation amplitude virtually independent of phase evolution - see Eq. (4.68) and its discussion. 

One more new effect is the so-called non-degenerate parametric excitation. It may be illustrated 
of the example of just two coupled oscillators - see Sec. 1 above. Let us assume that the coupling 
constant k, participating in Eqs. (5), is not constant, but oscillates in time - say with frequency co p . In 
this case the forces acting on each oscillator from its counterpart, described by the right-hand parts of 
Eqs. (5), will be proportional to Kq%,\(\ + ju cosco p f). Assuming that oscillations of q\ and qi are close to 
sinusoidal, with frequencies co\^, we see that the force acting on each oscillator will contain the so- 
called combinational frequencies 

(o p ±co 2X . (5.69) 

If one of these frequencies in the right-hand part of each equation coincides with its own oscillation 
frequency, we can expect a substantial parametric interaction between the oscillators (on the top of the 
constant coupling effects discussed in Sec. 1). According to Eq. (69), this may happen in two cases: 



co p = ± « 2 , 



Parametric 
(5.70) interaction 



condition 



The quantitative analysis (also highly recommended for reader's exercise) shows that in the 
positive sign case, the parameter modulation indeed leads to energy "pumping" into oscillations. As a 
result, sufficiently large /u, at sufficiently low damping coefficients S\,2 and effective detuning 

<f^-(Q 1+ Q 2 ), (5.71) 

may lead to the simultaneous excitation of two frequency components a>\^. These frequencies, while 
being close to corresponding eigenfrequencies of the system, are related to the pumping frequency co p by 
exact relation (70), but otherwise are arbitrary, e.g., incommensurate (Fig. 9), thus justifying the term 



18 In engineering, the attenuation coefficient of wave-carrying systems is most frequently characterized by a 
logarithmic measure called decibel per meter (or just dB/m): «dB/m = 10 logio a. 
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non-degenerate parametric excitation. (The parametric excitation of a single oscillator, that was 
analyzed in Sec. 4.5, in a particular, degenerate case of such excitation, with a>\ = 002 = co P /2.) On the 
other hand, for the case described by Eq. (70) with the negative sign, parameter modulation always 
pumps energy from the oscillations, effectively increasing system's damping. 

Somewhat counter-intuitively, this difference between two cases (70) may be simpler interpreted 
using the notions of quantum mechanics. Namely, equality co p = co\ + coi enables a decay of an external 
photon of energy hco p into two photons of energies ha>\ and hcoi going into the oscillatory system. (The 
complementary relation, co\ = co p + a>2, results in the oscillation photon decay.) 
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Fig. 5.9. Spectrum of oscillations at the non- 
degenerate parametric excitation 
(schematically). The arrow directions 
symbolize the power flow in and out of the 
system. 



Proceeding to nonlinear phenomena, let us note, first of all, that the simple reasoning, that 
accompanied Eq. (4.109), is also valid in the case when oscillations consist of two (or more) sinusoidal 
components with incommensurate frequencies. Replacing notation 2co for co p , we see that nondegenerate 
parametric excitation of the type (70a) is possible to implement in a system of two coupled oscillators 
with a quadratic nonlinearity (of the type yq 2 ), "pumped" by an intensive external signal at frequency co p 
« Qi + Q 2 - This is exactly how it is done in optics, where the nonlinearity is provided by media with a 
nonlinear relation between the electric polarization and electric field. 

At optical frequencies, however, it is hard to couple sufficient volume of the nonlinear media 
with lumped-type resonators which would have just two resonant frequencies Qi and Q2. This is why it 
is easier to implement the parametric excitation of light (as well as nonlinear phenomena like the higher 
harmonic generation) in distributed systems of the size much larger than the involved wavelengths. In 
such systems, the energy transfer from the initial (say, pumping) wave to generated waves is 
accumulated at their joint propagation along the system. From the analogy between Eq. (65) (describing 
the evolution of wave's amplitude in space), and the usual equation of the harmonic oscillator 
(describing its evolution in time), it is clear that this energy transfer accumulation requires not only the 
frequencies co, but also wave numbers k be in similar relations. For example, the non-degenerate 
parametric excitation requires that not only the frequency balance (70), co p = a>\ + CO2, but also a similar 
relation 

k p =k l +k 2 , (5.72) 

to be exactly fulfilled. This is only possible if the dispersion relation co(k) of the media is suitable. 

It may look like using a dispersion-free media, with colk = v = const, is the perfect solution for 
this task, because in such media Eq. (72) automatically follows from Eq. (70) with the plus sign. 
However, in such media not only the desirable three waves, but also all their harmonics, have the same 
velocity. At these conditions, energy transfer rates between all harmonics are of the same order. Perhaps 
the most important result of such multi-harmonic interaction is that intensive waves, interacting with 
nonlinear media, may develop sharply non-sinusoidal waveforms, in particular those with an almost 
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instant change of the field at a certain moment. Such shock waves, especially those of mechanical 
nature, present large interest for certain applications - some not quite innocent, e.g., the explosion of 
usual and nuclear bombs. I will only briefly return to shock waves in Sec. 8.5. 19 

On the other hand, for the parametric excitation (Fig. 9) the shock waves should be avoided. 
Various methods of arranging the suitable dispersion of optical waves, including various multi- 
dimensional geometric effects, are an important part of the field called nonlinear optics. Unfortunately, 
due to the lack of time/space, for more information on this interesting subject I have to refer the reader 
to special literature. 20 



5.6. Exercise problems 

5.1 . For the system of two elastically coupled pendula, confined to a 
vertical plane, with the parameters shown in Fig. on the right (cf. Problem 1.3), 
find possible frequencies of small sinusoidal oscillations, and the 
corresponding distribution coefficients. Sketch both oscillation modes. 



^^^^ 



/ 



K 



I 



gj 6A/W> 

+ m m 



5.2 . The same task as in Problem 1, for the double pendulum, confined to the 
vertical plane containing the support point (considered in Problem 2.1), with m' = m 
and 1 = 1' - see Fig. on the right. 




5.3 . The same task as in Problem 1 , for a linear, symmetric system 
of 3 particles, shown in Fig. on the right. Assume that the connections 
between the particles not only act as usual elastic springs (as described by 

their potential energies U = k1 2 12), but also resist system's bending, 

giving an additional potential energy U' = k'1 2 6 2 12, where 6 is the 
(small) bending angle. 21 
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5.4 . Calculate the energy (per unit length) of a k 
sinusoidal traveling wave propagating in the ID system 
shown in Fig. 4 (reproduced on the right). Use your result 
to calculate the average power flow created by the wave, and compare it with Eq. (46) (valid in the 
acoustic wave limit). 



19 The classical (and perhaps still the best) monograph on the subject is Ya. B. Zeldovich, Physics of Shock Waves 
and High-Temperature Phenomena, Dover, 2002. 

20 See, e.g., the classical monograph by N. Bloembergen, Nonlinear Optics, 4 th ed., World Scientific, 1996, or a 
more modern treatment by R. W. Boyd, Nonlinear Optics, 3 rd ed., Academic Press, 2008. 

21 This is a good model for small oscillations of linear molecules such as CO2 (for which the values of elastic 
constants /rand k' are well known). 
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5.5 . Calculate the dispersion law oik) and the maximum 
frequency of small longitudinal waves in an infinite line of 
similar, spring-coupled pendula - see Fig. on the right. 



g 



1 



^^^^^^ 



K 



K 



m 



m 



m 



K 



K 



K 



m 



5.6 . Calculate and analyze the dispersion relation , , , x _ , A „ _ 

oik) for longitudinal waves in an infinite ID chain of ~~ V V \y V V V V \-J~ 
coupled oscillators with alternating masses - see Fig. on the 
right. In particular, find and discuss dispersion relation's 
period Ak. 
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5.7 . Calculate the longitudinal wave reflection from a K K K 

"point inhomogeneity": a single particle with a different 

mass mo ^ m, in an otherwise uniform ID chain - see Fig. on m m m o m 

the right. Analyze the result. < 2 > < d > < 2 > 



5.8 . Analyze the mutual phase locking of two weakly coupled self-oscillators with the dissipative 
nonlinearity described by Eq. (4.62), using the rotating-wave approximation. 



5.9 . Find the condition of non-degenerate parametric excitation in a system of two coupled 
oscillators, described by Eqs. (5) with time-dependent coupling: /c —> k{\ + //cos co p f), with co p « Qi + 
Q2, and Q2 - Qi » klm. 

Hint: Assuming the modulation depth /u, static coupling k, and detuning ^ = a> p - (Qi+ Q2) 
sufficiently small, use the rotating-wave approximation for each of the coupled oscillators. 
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Chapter 6. Rigid Body Motion 

This chapter discusses the motion of rigid bodies, with a focus on their rotation. Some byproduct results 
of this analysis will enable us to discuss, in the end of the chapter, the description of motion of point 
particles in non-inertial reference frames. 



6.1. Angular velocity vector 

Our study of ID waves in the past chapter has prepared us to for a discussion of 3D systems of 
particles. We will start it with a (relatively :-) simple limit when the changes of distances r&t- = |rk -iv| 
between particles of the system are negligibly small. Such an abstraction is called the {absolutely) rigid 
body, and is a reasonable approximation in many practical problems, including the motion of solids. In 
this model we neglect deformations - that will be the subject of the next two chapters. 

The rigid body approximation reduces the number of degrees of freedom of the system from 3N 
to just 6 - for example, 3 Cartesian coordinates of one point (say, O), and 3 angles of the system rotation 
about 3 mutually perpendicular axes passing through this point. (An alternative way to arrive at the 
same number 6 is to consider 3 points of the body, which uniquely define its position. If movable 
independently, the points would have 9 degrees of freedom, but since 3 distances ri±- between them are 
now fixed, the resulting 3 constraints reduce the number of degrees of freedom to 6.) 

Let us show that an arbitrary elementary displacement of such a rigid body may be always 
considered as a sum of a translational motion and a rotation. Consider a "moving" reference frame, 
firmly bound to the body, and an arbitrary vector A - see Fig. 1 . 




Fig. 6.1. Deriving Eq. (8). 



The vector may be represented by its Cartesian components Aj in that reference frame: 

A = 2>.n.. (6.1) 

Let us calculate its time derivative in an arbitrary, possibly different ("lab") frame, taking into account 
that if the body rotates relative to this frame, then the directions of the unit vectors n 7 change in time. 
Hence, we have to differentiate both operands in each product contributing to sum (1): 

JA, 4,dA 3 dn 

^U=I^" i+ |>^. (6.2) 
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In this expression, the first sum evidently describes the change of vector A as observed from the moving 
frame. Each of the infinitesimal vectors Jn 7 participating in the second sum may be presented by its 
Cartesian components in the moving frame: 



dn j =J^d( Pjr n j 



(6.3) 



/ = 1 



In order to find more about the set of scalar coefficients dcpjy, let us scalar-multiply each part of this 
relation by an arbitrary unit vector ny, and take into account the evident orthogonality condition: 



n .... = <>\ 



(6.4) 



As a result, we get 

dtpjj,, = diij -n .„ . (6.5) 

Now let us use Eq. (5) to calculate the first differential of Eq. (4): 

dn y ■ n .„ +n ., • dn f = dcp jT + dcp . . = 0; in particular, 2Jn ; ■ n . = 2dcp .. = 0 . (6.6) 

These relations, valid for any choice of indices j, j', and j" of the set {1, 2, 3}, mean that the matrix of 
elements dcpjy is antisymmetric; in other words, there are not 9, but just 3 independent coefficients dcpjy, 
all with j ^ j'. Hence it is natural to renumber them in a simpler way: dcpjy = - dcpyj = dcpy, where indices 
j, j', and j" follow in a "correct" order - either {1,2,3}, or {2,3,1}, or {3,1,2}. Now it is easy to check 
(say, just by a component-by-component comparison) that in this new notation, Eq. (3) may be 
presented just as a vector product: 



diij =d<pxnj, 



(6.7) 



where dtp is the infinitesimal vector defined by its Cartesian components dg)j (in the moving frame). 
Relation (7) is the basis of all rotation kinematics. Using it, Eq. (2) may be rewritten as 



dA t 
dt 



in lab 



dA 

dt 



v-i . d(p dA, 

+ > A — -xn . = 

in mov 1 dt 1 dt 1 



+ co x A, where w = 



d<p 

dt 



(6.8) 



Elementary 
rotation 



Vector's 
evolution 
in time 



In order to interpret the physical sense of vector a>, let us apply Eq. (8) to the particular case when A is 
the radius-vector r of a point of the body, and the lab frame is selected in a special way: its origin moves 
with the same velocity as that of the moving frame in the particular instant under consideration. In this 
case the first term in the right-hand part of Eq. (8) is zero, and we get 



dr i 

dt 



in special lab frame 



= oxr , 



(6.9) 



were vector r is the same in both frames. According to the vector product definition, the particle 
velocity described by this formula has a direction perpendicular to vectors a> and r (Fig. 2), and 
magnitude corsinO . As Fig. 2 shows, this expression may be rewritten as cop, where p = rsmO is the 
distance from the line that is parallel to vector co and passes through point O. This is of course just the 
pure rotation about that line (called the instantaneous axis of rotation), with angular velocity co. Since, 
according to Eqs. (3) and (8), the angular velocity vector co is defined by the time evolution of the 
moving frame alone, it is the same for all points r, i.e. for the rigid body as a whole. Note that nothing in 



Chapter 6 



Page 2 of 28 



Essential Graduate Physics 



CM: Classical Mechanics 



our calculations forbids not only the magnitude but also the direction of vector co, and thus of the 
instantaneous axis of rotation, to change in time (and in many cases it does); hence the name. 




Fig. 6.2. Instantaneous axis of rotation. 



Now let us generalize our result a step further, considering two laboratory reference frames that 
do not rotate versus each other: one arbitrary, and another one selected in the special way described 
above, so that for it Eq. (9) is valid in it. Since their relative motion of these two reference frames is 
purely translational, we can use the simple velocity addition rule given by Eq. (1.8) to write 



Body 






point's 


V in lab 


V 0 in lab + V in special lab frame V 0 in lab + WX 


velocity 





where r is the radius-vector of a point is measured in the body-bound ("moving") frame O. 



6.2. Inertia tensor 

Since the dynamics of each point of a rigid body is strongly constrained by conditions = 
const, this is one of the most important fields of application of the Lagrangian formalism that was 
discussed in Chapter 2. The first thing we need to know for using this approach is the kinetic energy of 
the body in an inertial reference frame. It is just the sum of kinetic energies of all its points, so that we 
can use Eq. (10) to write: 1 

T = v 2 = (v 0 + « x r) 2 = v 2 + £mv 0 ■ (© x r) + (© * r) 2 - (6- 1 1) 

Let us apply to the right-hand part of Eq. (11) two general vector analysis formulas, listed in the Math 
Appendix: the operand rotation rule MA Eq. (7.6) to the second term, and MA Eq. (7.7b) to the third 
term. The result is 

T = Zf < + 2> • fro >< ») + Zf Wr 2 ~ (0) • r) 2 ]. (6.12) 

This expression may be further simplified by making a specific choice of point O (from the radius- 
vectors r of all particles are measured), namely if we use for this point the center of mass of the body. 
As was already mentioned in Sec. 3.4, radius-vector R of this point is defined as 

MR = ^mr, M =^m, (6.13) 



1 Actually, all symbols for particle masses, coordinates and velocities should carry the particle index, say k, over 
which the summation is carried out. However, for the sake of notation simplicity, this index is just implied. 



Chapter 6 



Page 3 of 28 



Essential Graduate Physics 



CM: Classical Mechanics 



where M is just the total mass of the body. In the reference frame centered as that point, R = 0, so that 
in that frame the second sum in Eq. (12) vanishes, so that the kinetic energy is a sum of two terms: 

T=T^+T mi , T^^V\ r rot .X^V-(o>.r) 2 ], (6.14) 

where V = dR/dt is the center-of-mass velocity in our inertial reference frame, and all particle positions 
r have to be measured in the center-of-mass frame. Since the angular velocity vector co is common for 
all points of a rigid body, it is more convenient to rewrite the rotational energy in a form in which the 
summation over the components of this vector is clearly separated from the summation over the points 
of the body: 

1 3 

z j,M 

where the 3x3 matrix with elements 

is called the inertia tensor of the body. 

Actually, the term "tensor" for the matrix has to be justified, because in physics this name 
implies a certain reference-frame-independent notion, so that its elements have to obey certain rules at 
the transfer between reference frames. In order to show that the inertia tensor deserves its title, let us 
calculate another key quantity, the total angular momentum L of the same body. 2 Summing up the 
angular momenta of each particle, defined by Eq. (1.31), and using Eq. (10) again, in our inertial 
reference frame we get 

Ls^rxp = ^mrxv = ^mrx(v 0 + wxr)=^fflrxv 0 +^mr x (ox r) . (6.17) 

We see that the momentum may be presented as a sum of two terms. The first one, 

L 0 s^mrxv 0 =MRxv 0 , (6.18) 

describes possible rotation of the center of mass about the inertial frame origin. This term evidently 
vanishes if the moving reference frame's origin O is positioned at the center of mass. In this case we are 
left with only the second term, which describes the rotation of the body about its center of mass: 

L = L rotS £mrx((oxr). (6-19) 
Using one more vector algebra formula, the "bac minis cab" rule, 3 we may rewrite this expression as 

h = Y,m[(or 2 -r(r-e>)]. (6.20) 
Let us spell out an arbitrary Cartesian component of this vector: 



Kinetic 
(6.15) energy of 
rotation 



tc 1 c\ Inertia 
( 6 - 16 ) tensor 



2 Hopefully, there is a little chance of confusion between the angular momentum L (a vector) and its Cartesian 
components Lj (scalars with an index) on one hand, and the Lagrange function L (a scalar without an index) on the 
other hand. 

3 See, e.g., MA Eq. (7.5). 
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in 



r r<°r 



=Z m Z^/( r2 ^- r //) 



(6.21) 



Angular 
momentum 



Changing the order of summations, and comparing the result with Eq. (16), we see that the angular 
momentum may be conveniently expressed via the same matrix elements /#< as the rotational kinetic 
energy: 



(6.22) 



Since L and go are both legitimate vectors (meaning that they describe physical vectors 
independent on the reference frame choice), their connection, the matrix of elements Ijy, is a legitimate 
tensor. This fact, and the symmetry of the tensor = Iyj), which is evident from its definition (16), 
allow the tensor to be further simplified. In particular, mathematics tells us that by a certain choice of 
the axis orientation, any symmetric tensor may be reduced to a diagonal form 




where, in our case 



Principal 
moments of 
inertia 



I. = Y,m(r 2 -rf)= + rf.)= Yu m p) 



(6.23) 



(6.24) 



Pj being the distance of the particle from the j-th axis, i.e. the length of the perpendicular dropped from 
the point to that axis. The axes of such special coordinate system are called the principal axes, while 
the diagonal elements Ij given by Eq. (24), the principal moments of inertia of the body. In such a 
special reference frame, Eqs. (15) and (22) are reduced to very simple forms: 



Rotational 
energy 
and angular 
momentum 
in principal 
axes 



7=1 1 



L J =I ] co j . 



(6.25) 
(6.26) 



Both these results remind the corresponding relations for the translational motion, r tran = MV 12 and P = 
MV, with the angular velocity co replacing the "linear" velocity V, and the tensor of inertia playing the 
role of scalar mass M. However, let me emphasize that even in the specially selected coordinate system, 
with axes pointing in principal directions, the analogy is incomplete, and rotation is generally more 
complex than translation, because the measures of inertia, I Jt are generally different for each principal 
axis. 

Let me illustrate this fact on a simple but instructive system of three similar massive particles 
fixed in the vertices of an equilateral triangle (Fig. 3). 




Fig. 6.3. Principal moments of 
inertia: a simple case study. 
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Due to symmetry of the configuration, one of the principal axes has to pass through the center of 
mass O, perpendicular to the plane of the triangle. For the corresponding principal moment of inertia, 
Eq. (24) readily yields h = 3m/? 2 . If we want to express the result in terms of the triangle side a, we may 
notice that due to system's symmetry, the angle marked in Fig. 3 equals 7i/6, and from the corresponding 
right triangle, a/2 = /xos(;z/6) = /W3/2, giving p = aH3, so that, finally, 7 3 = ma 2 . 

Another way to get the same result is to use the following general axis shift theorem, which may 
be rather useful - especially for more complex cases. Let us relate the inertia tensor components Ijf and 
I'jj', calculated in two reference frames displaced by a certain vector d (Fig. 4a), so that for an arbitrary 
point, r' = r + d. Plugging this relation into Eq. (16), we get 



(6.27) 



= X+ 2 +2r-d + d 2 )S jr - (r,r y , + r.d r + r j ,d j + djd J. 

Since in the center-of-mass frame O, all sums Yjnrj equal zero, we may use Eq. (16) to finally obtain 

(6.28) 



I F +M(S F d- 



djd f ). 



In particular, this equation shows that if the shift vector d is perpendicular to one (say, 7-th) of the 
principal axes (Fig. 4b), i.e. dj = 0, then Eq. (28) is reduced to a very simple formula: 



rj=i J+ Md' 



(6.29) 



Rotation 

axis 

shift 



Principal 

axis 

shift 



(a) 




(b) 



Fig. 6.4. (a) General reference frame 
shift, and (b) a shift perpendicular to 
one of the principal axes. 



Returning to the system shown in Fig. 3, let us perform such a shift so that the new ("primed") 
axis passes through the location of one of the particles, still perpendicular to particles' plane. Then the 
contribution of that particular mass to the primed moment of inertia vanishes, and = 2ma 2 . Now, 
returning to the center of mass and applying Eq. (29), we get 7 3 = 7' 3 - Mp 2 = 2ma 2 - (3m)(aA/3) 2 = ma 2 , 
i.e. the same result as above. 

The symmetry situation inside the triangle plane is somewhat less evident, so let us start with 
calculating the moments of inertia for the axes shown vertical and horizontal in Fig. 3. From Eq. (24) we 
readily get: 



/[ = 2mh 2 + mp 2 = m 



f V 
a 



+ 



a 



2V3j IV3 



ma 



1 2 = 2m 



fa^ 2 



ma 



(6.30) 
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Symmetric 
top: 
definition 



where I have taken into account the fact that the distance h from the center of mass and any side of the 
triangle is h = yosin (;r/6) = p/2 = We see that I\ = h, and mathematics tells us that in this case any 

in-plane axis (passing through the center of mass O) may be considered as principal, and has the same 
moment of inertia. A rigid body with this property, I\ = h ^ h, is called the symmetric top. (The last 
direction is called the main principal axis of the system.) 

Despite the name, the situation may be even more symmetric in the so-called spherical tops, i.e. 
highly symmetric systems whose principal moments of inertia are all equal, 



Spherical 
top: 

definition 
and 
description 



(6.31) 



Mathematics says that in this case the moment of inertia for rotation about any axis (but still passing 
through the center of mass) is equal to the same /. Hence Eqs. (25) and (26) are further simplified for 
any direction of vector go: 



/ 2 

— co . 
2 



L = 1(0 . 



(6.32) 



thus making the analogy of rotation and translation complete. (As will be discussed in the next section, 
the analogy is also complete if the rotation axis is fixed by external constraints.) 

An evident example of a spherical top is a uniform sphere or spherical shell; a less obvious 
example is a uniform cube - with masses either concentrated in vertices, or uniformly spread over the 
faces, or uniformly distributed over the volume. Again, in this case any axis passing through the center 
of mass is principal, and has the same principal moment of inertia. For a sphere, this is natural; for a 
cube, rather surprising - but may be confirmed by a direct calculation. 



6.3. Fixed-axis rotation 

Now we are well equipped for a discussion of rigid body's rotational dynamics. The general 
equation of this dynamics is given by Eq. (1.38), which is valid for dynamics of any system of particles 
- either rigidly connected or not: 

L = t , (6.33) 

where x is the net torque of external forces. Let us start exploring this equation from the simplest case 
when the axis of rotation, i.e. the direction of vector co, is fixed by some external constraints. Let us 
direct axis z along this vector; then co x = C0y = 0. According to Eq. (22), in this case, the z-component of 
the angular momentum, 

L z =I zz oo z , (6.34) 

where I zl , though not necessarily one of the principal momenta of inertia, still may be calculated using 
Eq.(24): 

with p z being the distance of each particle from the rotation axis z. According to Eq. (15), the rotational 
kinetic energy in this case is just 
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(6.36) 



Moreover, it is straightforward to use Eqs. (12), (17), and (28) to show that if the rotation axis is 
fixed, Eqs. (34)-(36) are valid even if the axis does not pass through the center of mass - if only 
distances p z are now measured from that axis. (The proof is left for reader's exercise.) 

As a result, we may not care about other components of vector L, 4 and use just one component 
ofEq. (33), 

L z =t z , (6.37) 
because it, when combined with Eq. (34), completely determines the dynamics of rotation: 

I zz d> z =r z , ic.lj z =r z , (6.38) 

where 9 Z is the angle of rotation about the axis, so that <x> z =6 . Scalar relations (34), (36) and (38), 
describing rotation about a fixed axis, are completely similar to the corresponding formulas of ID 
motion of a single particle, with co z corresponding to the usual ("linear") velocity, the angular 
momentum component L z - to the linear momentum, and I z - to particle's mass. 

The resulting motion about the axis is also frequently similar to that of a single particle. As a 
simple example, let us consider what is called the physical pendulum (Fig. 5) - a rigid body free to rotate 
about a fixed horizontal axis A that does not pass through the center of mass O, in the uniform gravity 
field g. 




Fig. 6.5. Physical pendulum. The 
fixed (horizontal) rotation axis A is 
perpendicular to the plane of drawing. 



Let us drop a perpendicular from point O to the rotation axis, and call the corresponding vector 1 
(Fig. 5). Then the torque (relative to axis A) of the forces exerted by the axis constraint is zero, and the 
only contribution to the net torque is due to gravity alone: 

T| m A S Z r |mA XF = Z( 1 + r |m 0 ) Xm g = X m ( 1X §) + XHmO X § =M1X g- (6-39) 

(For the last transition, I have used the facts that point O is the center of mass, and that vectors 1 and g 
are the same for all particles of the body.) This result shows that the torque is directed along the rotation 



4 Note that according to Eq. (22), other Cartesian components of the angular momentum, L x = l xz co z and L y = I yz a> z 
may be different from zero, and even evolve in time. (Indeed, if axes x and y are fixed in lab frame, I xz and I yz 
may change due to body's rotation.) The corresponding torques r x ext> and r/ e , described by Eq. (33), are 
automatically provided by external forces which keep the rotation axis fixed. 
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axis, and its (only) component r z is equal to -MghmO, where 6 is the angle between vectors 1 and g, i.e. 
the angular deviation of the pendulum from the position of equilibrium. As a result, Eq. (38) takes the 
form, 



-Mgl sin 0, 



(6.40) 



where, 7a is the moment of inertia for rotation about axis A rather about the center of mass. This 
equation is identical to that of the point-mass (sometimes called "mathematical") pendulum, with the 
small-oscillation frequency 



Physical 


n = 


f Mgl~" 


1/2 


pendulum's 




frequency 




V I A J 





(6.41) 



As a sanity check, in the simplest case when the linear size of the body is much smaller than the 
suspension length Z, Eq. (35) yields Ia = Ml , and Eq. (41) reduces to the well-familiar formula Q = 

1 /9 

(g/l) for the mathematical pendulum. 

Now let us discuss the situations when a body not only rotates, but also moves as the whole. As 
we already know from our introductory chapter, the total momentum of the body, 



P = ^ mv = ^ mr = — ^ 



mr , 



(6.42) 



satisfies the 2 nd Newton law in the form (1.30). Using the definition (13) of the center of mass, the 
momentum may be presented as 



p =MR = MV 



C.O.M.'s 
law of 
motion 



so Eq. (1.30) may be rewritten as 



MV = F. 



(6.43) 



(6.44) 



where F is the vector sum of all external forces. This equation shows that the center of mass of the body 
moves exactly as a point particle of mass M, under the effect of the net force F. In many cases this fact 
makes the translational dynamics of a rigid body absolutely similar to that of a point particle. 

The situation becomes more complex if some of the forces contributing to the vector sum F 
depend on rotation of the same body, i.e. if its rotational and translational motions are coupled. Analysis 
of such coupled motion is rather straightforward if the direction of the rotation axis does not change in 
time, and hence Eqs. (35)-(36) are still valid. Possibly the simplest example is a round cylinder (say, a 
wheel) rolling on a surface without slippage (Fig. 6). 



(a) 




(b) 



A 




Fig. 6.6. Round cylinder 
rolling over (a) plane surface 
and (b) concave surface. 
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The no-slippage condition may be presented as the requirement of zero net velocity of the 
particular wheel point A that touches the surface - in the reference frame connected to the surface. For 
the simplest case of plane surface (Fig. 6a), the application of Eq. (10) shows that this requirement gives 
the following relation between the angular velocity co of the wheel and the linear velocity V of its 
center: 

V + rco = 0. (6.45) 

Such kinematic relations are essentially holonomic constraints, which reduce the number of 
degrees of freedom of the system. For example, without condition (45) the wheel on a plane surface has 
to be considered as a system with two degrees of freedom, so that its total kinetic energy (14) is a 
function of two independent generalized velocities, say V and co : 

T=T trdn +T mt =^V 2 +^co 2 . (6.46) 

Using Eq. (45) we may eliminate, for example, the linear velocity and reduce Eq. (46) to 

T = ^-(cor) 2 + ^co 2 =Y 0}2 > where / ef =/ + Mr 2 . (6.47) 

This result may be interpreted as the kinetic energy of pure rotation of the wheel about the instantaneous 
axis A, with 7 e f being the moment of inertia about that axis, satisfying Eq. (29). 

Kinematic relations are not always as simple as Eq. (45). For example, if the wheel is rolling on 
a concave surface (Fig. 6b), we need relate the angular velocities of the wheel rotation about its axis O 
(denoted co) and that of its axis' rotation about the center O' of curvature of the surface (Q). A popular 
error here is to write Q = -(r/R)co [WRONG!]. A prudent way to get the correct relation is to note that 
Eq. (45) holds for this situation as well, and on the other hand the same linear velocity of wheel's center 
may be expressed as V= (R— r)Q. Combining these equations, we get a (not quite evident) relation 

Q = — co. (6.48) 

R — r 

Another famous example of the relation between the translational and rotational motion is given 
by the "sliding ladder" problem (Fig. 7). Let us analyze it for the simplest case of negligible friction, and 
ladder's thickness small in comparison with its length / . 




In order to use the Lagrangian formalism, we may write the kinetic energy of the ladder as the 
sum (14) of the translational and rotational parts: 



Chapter 6 



Page 10 of 28 



Essential Graduate Physics 



CM: Classical Mechanics 



t M (v2 ^vA^ 1 -2 
2 V ' 2 



(6.49) 



where X and Y are the Cartesian coordinates of its center of mass in an inertial reference frame, and / is 
the moment of inertia for rotation about the z-axis passing through the center of mass. (For the 
uniformly-distributed mass, an elementary integration of Eq. (35) yields / = M/ 2 /12). In the reference 
frame with the center in the corner O, both X and Y may be simply expressed via angle a : 



v 1 

X = —cosa, 
2 



v 1 ■ 
Y = — sin«. 

2 



(6.50) 



(The easiest way to obtain these relations is to notice that the dashed line in Fig. 7 has slope a and 
length 112.) Plugging these expressions into Eq. (49), we get 



T = ^d z 
2 



I e{ =I + M 



l_ 

v2y 



1 



MV 



(6.51) 



Since the potential energy of the ladder in the gravity field may be also expressed via the same angle, 



/ . 

U = MgY = Mg—sma, 



(6.52) 



a may be conveniently used as the (only) generalized coordinate of the system. Even without writing the 
Lagrangian equation of motion for that coordinate explicitly, we may notice that since the Lagrangian 
function (T - U) does not depend on time explicitly, and the kinetic energy (51) is a quadratic- 
homogeneous function of the generalized velocity a , the full mechanical energy, 



E = T + U = LLa 2 +M gL sina = Msl 
2 2 2 



la' 



■ + sin a 



(6.53) 



J 



is conserved and gives us the first integral of motion. Moreover, Eq. (53) shows that the system's energy 
(and hence dynamics) is identical to that of a physical pendulum with an unstable fixed point a\ = 7il2, 
stable fixed point at « 2 = -nl2, and frequency 



21 



(6.54) 



of small oscillations near the latter point. (Of course, that fixed point cannot be reached in the simple 
geometry shown in Fig. 7, where ladder's hitting the floor would change its equations of motion). 



6.4. Free rotation 

Now let us proceed to more complex case when the rotation axis is not fixed. A good illustration 
of the complexity arising is this case comes from the simplest case of a rigid body left alone, i.e. not 
subjected to external forces and hence its potential energy U is constant. Since in this case, according to 
Eq. (44), the center of mass moves (as measured from any inertial reference frame) with a constant 
velocity, we can always use an convenient inertial reference frame with the center at that point. From 
the point of view of such frame, the body's motion is a pure rotation, and 7^ = 0. Hence, the system's 
Lagrangian equals just the rotational energy (15), which is, first, a quadratic-homogeneous function of 
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components coj (that may be taken for generalized velocities), and, second, does not depend on time 
explicitly. As we know from Chapter 2, in this case the energy is conserved. For the components of 
vector co in the principal axes, this means 





3 / 

^ot =Zt^ =C0I1St - 

7=1 1 


(6.55) 


Rotational 

energy's 

conservation 


Next, as Eq. (33) shows, in the absence of external forces the angular momentum L of the body 
is conserved as well. However, though we can certainly use Eq. (26) to present this fact as 






3 

L = V / .a>,n . = const , 

j j j 

7=1 


(6.56) 


Angular 

momentum's 

conservation 



where n ; are the principal axes of inertia, this does not mean that components coj of the angular velocity 
vector co are constant, because the principal axes are fixed relative to the rigid body, and hence may 
rotate with it. 



Before going after these complications, let us briefly mention two conceptually trivial, but 
practically very important, particular cases. The first is a spherical top (I\ = h = h = I). In this case Eqs. 
(55) and (56) imply that all components of vector co = L/7, i.e. both the magnitude and the direction of 
the angular velocity are conserved, for any initial spin. In other words, the body conserves its rotation 
speed and axis direction, as measured in an inertial frame. 

The most obvious example is a spherical planet. For example, our Mother Earth, rotating about 
its axis with angular velocity co = 2nl(\ day) » 7.3xl0" 5 s" , keeps its axis at a nearly constant angle of 
23°27' to the ecliptic pole, i.e. the normal to the plane of its motion around the Sun. (In Sec. 6 below, 
we will discuss some very slow motions of this axis, due to gravity effects.) 

Spherical tops are also used in the most accurate gyroscopes, usually with gas or magnetic 
suspension in vacuum. If done carefully, such systems may have spectacular stability. For example, the 
gyroscope system of the Gravity Probe B satellite experiment, flown in 2004-2005, was based on quartz 
spheres - round with precision of about 10 nm and covered by superconducting thin films (which have 
enabled their magnetic suspension and SQUID monitoring). The whole system was stable enough to 
measure that the so-called geodetic effect in general relativity (essentially, the space curving by Earth's 
mass), resulting in the axis precession by just 6.6 arcseconds per year, i.e. with a precession frequency 
of just ~10" 11 s" 1 , agrees with theory with a record -0.3% accuracy. 5 

The second simple case is that of the "symmetric top" (l\ = h & h), with the initial vector L 
aligned with the main principal axis. In this case, co = L//3 = const, so that the rotation axis is 
conserved. 6 Such tops, typically in the shape of a flywheel (rotor) supported by a "gimbal" system (Fig. 
8), are broadly used in more common gyroscopes, core parts of automatic guidance systems, for 



5 Such beautiful experimental physics does not come cheap: the total Gravity Probe B project budget was about 
$750M. Even at this price tag, the declared main goal of the project, an accurate measurement of a more subtle 
relativistic effect, the so-called frame-dragging drift (or "the Schiff precession"), predicted to be about 0.04 
arcseconds per year, has not been achieved. 

6 This is also true for an asymmetric top, i.e. an arbitrary body (with, say, I\ < I 2 < h), but in this case the 
alignment of vector L with axis n 2 , corresponding to the intermediate moment of inertia, is unstable. 



Chapter 6 Page 12 of 28 



Essential Graduate Physics 



CM: Classical Mechanics 



example, in ships, airplanes, missiles, etc. Even if the ship's hull wobbles, the suspended gyroscope 
sustains its direction relative to Earth (which is sufficiently inertial for these applications). 7 



Gyroscope 
frame 



Gimbal 




n axis 



Rotor 



Fig. 6.8. Typical gyroscope. (Adapted 
http://en.wikipedia.org/wiki/Gyroscope .) 



from 



However, in the general case with no such special initial alignment, the dynamics of symmetric 
tops is more complex. In this case, vector L is still conserved, including its direction, but vector co is not. 
Indeed, let us direct axis 112 perpendicular to the common plane of vectors L and the instantaneous 
direction n 3 of the main principal axis (in Fig. 9, the plane of drawing); then, in that particular instant, 
L 2 = 0. Now let us recall that in a symmetric top, axis 112 is a principal one. According to Eq. (26) with j 
= 2, the corresponding component coi has to be equal to L^h, so it vanishes. This means that vector co 
lies in this plane (the common plane of vectors L and 113) as well - see Fig. 9a. 

(b) 

Fig. 6.9. Free rotation of a symmetric top: (a) 
the general configuration of vectors, and (b) 
calculating the free precession frequency. 



Now consider any point of the body, located on axis 113, and hence within plane [n 3 , L]. Since co 
is the instantaneous axis of rotation, according to Eq. (9), the point has instantaneous velocity v = coxr 
directed normally to that plane. Since this is true for each point of the main axis (besides only one, with 
r = 0, i.e. the center of mass, which does not move), this axis as a whole has to move perpendicular to 
the common plane of vectors L, co, and 113. Since such conclusion is valid for any moment of time, it 
means that vectors co and 113 rotate about the space-fixed vector L together, with some angular velocity 
cOpre, at each moment staying in one plane. This effect is usually called the free precession (or "torque- 




7 Much more compact (and much less accurate) gyroscopes used, e.g., in smartphones and tablet computers, are 
based on the effect of rotation on oscillator frequency, and implemented as micro-electromechanical systems 
(MEMS) on silicon chip surface - see, e.g., Chapter 22 in V. Kaajakari, Practical MEMS, Small Gear Publishing, 
2009. 
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free", or "regular") precession, and has to be clearly distinguished it from the completely different effect 
of the torque-induced precession which will be discussed in the next section. 

In order to calculate C0p Ve , let us present the instant vector co as a sum of not its Cartesian 
coordinates (as in Fig. 9a), but rather of two non-orthogonal vectors directed along 113 and L (Fig. 9b): 

o» = «i-ot n 3 + *V n L, n L = — • (6-57) 

It is clear from Fig. 9b that ctkox has the meaning of the angular velocity of body rotation of the body 
about its main principal axis, while C0p re is the angular velocity of rotation of that axis about the constant 
direction of vector L, i.e. the frequency of precession. Now the latter frequency may be readily 
calculated from the comparison of two panels of Fig. 9, by noticing that the same angle 0 between 
vectors L and 113 participates in two relations: 

sin# = ^ = -^. (6.58) 

Since axis ni is principal, we may use Eq. (26) for j = 1, i.e. L\ =I\CO\, to eliminate co\ from Eq. (58), and 
get a very simple formula 



L 

%re=- 
1 1 



Free 



(6.59) precession 
frequency 



This result shows that the precession frequency is constant and independent of the alignment of vector L 
with the main principal axis 113, while the amplitude of this motion (characterized by angle 6) does 
depend on the alignment, and vanishes if L is parallel to 113. 8 Note also that if all principal moments of 
inertia are of the same order, C0p re is of the same order as the total angular velocity co = [co| of rotation. 

Now, let us briefly discuss the free precession in the general case of an "asymmetric top", i.e. a 
body with I\ ^ h ^ h. In this case the effect is more complex because here not only the direction but 
also the magnitude of the instantaneous angular velocity co may evolve in time. If we are only interested 
in the relation between the instantaneous values of C0j and Lj, i.e. the "trajectories" of vectors co and L as 
observed from the reference frame {111,112,113} of the principal axes of the body (rather than an explicit 
law of their time evolution), they may be found directly from the conservation laws. (Let me emphasize 
again that vector L, being constant in an inertial frame, generally evolves in the frame rotating with the 
body.) Indeed, Eq. (55) may be understood as the equation of an ellipsoid in Cartesian coordinates {co\, 
CO2, coi), so that for free body, vector co has to stay on the surface of that ellipsoid. 9 On the other hand, 
since the reference frame rotation preserves the length of any vector, the magnitude (but not direction!) 
of vector L is also an integral of motion in the moving frame, and we can write 

3 3 

l 2 = y, L ) = Z l ) = const • ( 6 - 6 °) 

7=1 7=1 



8 For Earth, the free precession amplitude it so small (below 1 0 m of linear displacement on the Earth surface) that 
this effect is of the same order as other, irregular motions of the rotation axis, resulting from the turbulent fluid 
flow effects in planet's interior and its atmosphere. 

9 It is frequently called the Poinsot ellipsoid, after L. Poinsot (1777-1859) who have made several key 
contributions to the rigid body mechanics. 
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Hence the trajectory of vector co follows the closed curve formed by the intersection of two ellipsoids, 
(55) and (60). It is evident that this trajectory is generally "taco-edge-shaped", i.e. more complex than a 
plane circle but never very complex either. 

The same argument may be repeated for vector L, for whom the first form of Eq. (60) descries a 
sphere, and Eq. (55), another ellipsoid: 



T. 



f,^-L)= const. (6.61) 



rot ij- 

H 11 j 



On the other hand, if we are interested in the trajectory of vector co in an inertial frame (in which 
vector L stays still), we may note that the general relation (15) for the same rotational energy r rot may 
also be rewritten as 



1 3 3 



2 j=l /'=! 



(6.62) 



But according to the Eq. (22), the second sum in the right-hand part is nothing more than L ; , so that 



T ! V T 1 T 



(6.63) 



This equation shows that for a free body (r rot = const, L = const), even is vector co changes in time, its 
end point should stay within a plane perpendicular to angular momentum L. (Earlier, we have seen that 
for the particular case of the symmetric top - see Fig. 9b, but for an asymmetric top, the trajectory of the 
end point may not be circular.) 

If we are interested not only in the trajectory of vector co, but also its explicit evolution in time, it 
may be calculated using the general Eq. (33) presented in principal components a>j. For that, we have to 
recall that Eq. (33) is only valid in an inertial reference frame, while the frame {ni, 112, 113} may rotate 
with the body and hence is generally not inertial. We may handle this problem by applying to vector L 
the general relation (8): 



+ oxL. (6.64) 



Euler 
equations 



dL I _ dL I 

jT |mlab= dp 

Combining it with Eq. (33), in the moving frame we get 

— + (DxL = t, (6.65) 
dt 

where x is the external torque. In particular, for the principal-axis components Lj, related to components 
C0j by Eq. (26), Eq. (65) is reduced to a set of three scalar Euler equations 

(6.66) 



I J d }j +(I r -I J .)co J .co J .,=T j . 



where the set of indices {j,j' J" } has to follow the usual "right" order - e.g., {1, 2, 3}, etc. 



10 



10 These equations are of course valid in the simplest case of the fixed rotation axis as well. For example, if co = 
n : a>, i.e. co x = a> Y = 0, Eq. (66) is reduced to Eq. (38). 
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In order to get a feeling how do the Euler equations work, let us return to the case of a free 
symmetric top {r\= ti= t?, = 0, I\ = h * h)- In this case, I\-h = 0, so that Eq. (66) with j = 3 yields <z>3 
= const, while the equations for j = \ and j = 2 take the simple form 

«1 = -^pre«2 » ®2 = ^pre^l , (6-67) 

where Q pre is a constant determined by the system parameters and initial conditions: 

n pK =co 3 ^-^. (6.68) 

Obviously, Eqs. (67) have a sinusoidal solution with frequency Q pre , and describe uniform rotation of 
vector co, with that frequency, about the main axis n 3 . This is just another presentation of the torque-free 
precession analyzed above, this time as observed from the rotating body. Evidently, Q pre is substantially 
different from the frequency a>p re (59) of the precession as observed from the lab frame; for example, the 
former frequency vanishes for the spherical top (with l\=h = h), while the latter frequency tends to the 
rotation frequency. 

Unfortunately, for the rotation of an asymmetric top (i.e., an arbitrary rigid body), when no 
component a>j is conserved, the Euler equations (66) are strongly nonlinear even in the absence of the 
external torque, and a discussion of their solutions would take more time than I can afford. 11 



6.5. Torque-induced precession 

The dynamics of rotation becomes even more complex in the presence of external forces. Let us 
consider the most important and counter-intuitive effect of torque-induced precession, for the simplest 
case of an axially-symmetric body (which is a particular case of the symmetric top, I\ = h ^ h) rapidly 
spinning about his symmetry axis, and supported at some point A of that axis, that does not coincide 
with the center of mass O - see Fig. 10. Without external forces, such top would retain the direction of 
its rotation axis that would always coincide with the direction of the angular momentum: 

L = / 3 co = / 3 « rot n 3 . (6.69) 



Z A 




(a) 




(b) 



Fig. 6.10. Symmetric top in the gravity field: 
(a) a side view at the system and (b) the top 
view at the evolution of the horizontal 
component of the angular momentum vector. 



11 Such discussion may be found, e.g. in Sec. 37 of L. Landau and E. Lifshitz, Mechanics, 3 r ed., Butterworth- 
Heinemann, 1976. 
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The uniform gravity field creates bulk-distributed forces that, as we know from the analysis of 
the physical pendulum in Sec. 3, are equivalent to a single force Mg applied in the center of mass - in 
Fig. 10, point O. The torque of the force relative to the support point A is 



Precession: 
equation 



T = IV 



iA xMg = M/n 3 xg. 



(6.70) 



Hence the general equation (33) of the angular momentum (valid in the inertial "lab" frame, in which 
point A rests) becomes 



L = M/n 3 xg. 



(6.71) 



Despite the apparent simplicity of this (exact!) equation, its analysis is straightforward only in 
the limit of relatively high rotation velocity oi ot or, alternatively, very small torque. In this limit, we 
may, in the 0 th approximation, still use Eq. (69) for L. Then Eq. (71) shows that vector Lis 
perpendicular to both n 3 (and hence L) and g, i.e. lies within the horizontal plane, and is perpendicular 
to the horizontal component L„. of vector L - see Fig. 10b. Since the magnitude of this vector is 
constant, | L | = mgl sin6>, vector L (and hence the body's main axis) rotates about the vertical axis with 
angular velocity 



Precession: 
frequency 




(6.72) 



Thus, very counter-intuitively, the fast-rotating top "does not want to" follow the external, 
vertical force and, in addition to fast spinning about the symmetry axis 113, also performs a revolution, 
called the torque-induced precession, about the vertical axis. Note that, similarly to the free-precession 
frequency (59), the torque-induced precession frequency (72) does not depend on the initial (and 
sustained) angle 6 . However, the torque-induced precession frequency is inversely (rather than directly) 
proportional to co, and is typically much lower. This relative slowness is also required for the validity of 
our simple theory of this effect. Indeed, in our approximate treatment we have used Eq. (69), i.e. 
neglected precession's contribution to the angular momentum vector L. This is only possible if the 
contribution is relatively small, 7<% e « hcOrot, where / is a certain effective moment of inertia for the 
precession (to be worked out later). Using our result (72), this condition may be rewritten as 



a) mt » 



, N 1/2 

' Mgir 



(6.73) 



For a body of not too extreme proportions, i.e. with all linear dimensions of the order of certain length /, 
all inertia moments are of the order of Ml 2 , so that the right-hand part of Eq. (73) is of the order of 

1/2 

(g/l) , i.e. comparable with the eigenfrequency of the same body as the physical pendulum, i.e. at the 
absence of fast rotation. 

In order to develop a qualitative theory that could take us beyond such approximate treatment, 
the Euler equations (66) may be used, but are not very convenient. A better approach, suggested by the 
same L. Euler, is to introduce a set of three independent angles between the principal axes {ni,n 2 ,n 3 } 
bound to the rigid body, and axes {n^n^nj of an inertial reference frame (Fig. 1 1), and then express the 
basic equation (33) of rotation, via these angles. There are several possible options for the definition of 
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such angles; 12 Fig. 1 1 shows the set of Euler angels, most convenient for discussion of fast rotation. As 
one can see at the figure, the first Euler angle, 6, is the usual polar angle measured from axis n z to axis 
113. The second one is the azimuthal angle <p, measured from axis n x to the so-called line of nodes formed ^ 
by the intersection of planes [n A ,n v ] and [ni,n 2 ]. The last Euler angle, yr, is measured within plane 
[111,112], from the line of nodes to axis ui. In the simple picture of the force-induced precession of a 
symmetric top, which was derived above, angle 6 is constant, angle yr changes very rapidly, with the 
rotation velocity <Sfot, while angle q> grows with the precession frequency <% e (72). 




Now we can express the principal-axes components of the instantaneous angular velocity vector, 
a>\, a>2, and 0)3, as measured in the lab reference frame, in terms of the Euler angles. It may be easily 
done calculating, from Fig. 1 1 , the contributions to the change of Euler angles to each principal axis, 
and then adding them up. The result is 





= (psmOsmy/ + Ocosy/, 


Components 




= qbsmOcosy/ -dsiny/, 


of co via 


a> 2 


(6.74) Euler 




= (pcos0 + y/. 


angles 


a> 3 





These formulas allow the expression of the kinetic energy of rotation (25) and the angular 
momentum components (26) in terms of the generalized coordinates 6, <p, and y/, and use then powerful 
Lagrangian formalism to derive their equations of motion. This is especially simple to do in the case of 
symmetric tops (with I\ = I 2 ), because plugging Eqs. (74) into Eq. (25) we get an expression, 

T mt =^(e 2 +<p 2 sm 2 e)+ I ±{<pcose + y>) 2 , (6.75) 

which does not include explicitly either cp or y/. (This reflects the fact that for a symmetric top we can 
always select axis ni to coincide with the line of nodes, and hence take yr = 0 at the considered moment 
of time. Note that this trick does not mean we can take y/ = 0 , because axis ni, as observed from the 
inertial reference frame, moves!) Now we should not forget that at the torque-induced precession, the 
center of mass moves as well (see Fig. 10), so that according to Eq. (14), the total kinetic energy of the 
body is the sum of two terms, 



12 Of the several choices more convenient in the absence of fast rotation, the most common is the set of so-called 
Tait-Brian angles (called the yaw, pitch, and roll) that are broadly used in airplane and maritime navigation. 
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T=T rot +T tmn , T tian =^-V 2 = ^(e 2 +<p 2 sin 2 o\ (6.76) 
while the potential energy is just 

U = Mgl cos 0 + const . (6.77) 

Now we could readily write the Lagrangian equations of motion for the Euler angles, but it is 
better to immediately notice that according to Eqs. (75)-(77), the Lagrangian function, T - U, does not 
depend explicitly on "cyclic" coordinates <p and y/, so that the corresponding generalized momenta are 
conserved: 

dT , 

p = — = / A ^sin 6 + I 3 {(p cos 6 + \f/) cos 6 = const, (6.78) 
d<p 

dT 

p = = I 3 {cp cos 0 + y/) = const, (6.79) 

d\f/ 

where, according to Eq. (29), I A = h +MI is just the body's moment of inertia for rotation about a 
horizontal axis passing through the support point A. According to the last of Eqs. (74), p w is just L 3 , the 
angular momentum's component along the rotating axis n 3 . On the other hand, by its definition p v is L z , 
the same vector L's component along the static axis z. (Actually, we could foresee in advance the 
conservation of both these components of L, because vector (70) of the external torque is perpendicular 
to both n 3 and n z .) Using these notions, and solving the simple system of linear equations (78)-(79) for 
the angle derivatives, we get 

. L-L 3 cos0 . L 3 L ? -L 3 cos6> 

<p = — — — 7 , y/ = — — - — — 7 cosO . (6.80) 



I A sm z 0 ' ' / 3 / A sin 2 # 



13 



One more conserved quantity in this problem is the full mechanical energy 

E = T + U =y(^ 2 +^ 2 sin 2 #)+^(>cos# + ^) 2 +Mglcos0. (6.81) 

Plugging Eqs. (80) into Eq. (81), we get a first-order differential equation for angle 6, which may be 
presented in the following physically transparent form: 

^ 6 2 + U e{ (ff) = E, U ef (60 = {Lz ~ LiC °*° )2 +^- + Mgl cos 6 + const . (6.82) 
2 21 A sin 6 2/3 

Thus, similarly to the planetary problems considered in Sec. 3.5, the symmetric top precession 
has been reduced (without any approximations!) to a ID problem of motion of one of its degrees of 
freedom, the polar angle 6 , in an effective potential U e f{0), which is the sum of the real potential energy 
U (77) and a contribution from the kinetic energy of motion along two other angles. In the absence of 
rotation about axes n z and n 3 (i.e., L z = L 3 = 0), Eq. (82) is reduced to the first integral of the equation 
(40) of motion of a physical pendulum. If the rotation is present, then (besides the case of special initial 



13 Indeed, since the Lagrangian does not depend on time explicitly, H = const, and since the full kinetic energy T 
is a quadratic-homogeneous function of the generalized velocities, E = H. 
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conditions when 6(0) = 0 and L z = L 3 ), 14 the first contribution to U e ^0) diverges at 6— > 0 and n, so that 
the effective potential energy has a minimum at some finite value 6b of the polar angle 0 . 

If the initial angle 6(0) equals this 6b, i.e. if the initial effective energy is equal to its minimum 
value C/ e f(6b), the polar angle remains constant through the motion: 6(f) = 6b. This corresponds to the 
pure torque-induced precession whose angular velocity is given by the first of Eqs. (80): 



^pre S <P ; 



, z - L 3 cos 6 0 
1 a sin2 #0 



(6.83) 



The condition for finding 6b, dUJdO = 0, is a transcendent algebraic equation that cannot be solved 
analytically for arbitrary parameters. However, in the high spinning speed limit (73), this is possible. 
Indeed, in this limit the potential energy contribution to C7 e f is small, and we may analyze its effect by 
successive approximations. In the 0 th approximation, i.e. at Mgl = 0, the minimum of Z7 e f is evidently 
achieved at cos 6b = LjLs, giving zero precession frequency (83). In the next, 1 st approximation, we may 
require that at 0= 60, the derivative of first term in the right-hand part of Eq. (82) for U e f over cos 6, 
equal to -L Z (L Z - L 3 cos6y/~Asin 6, 15 is cancelled with that of the gravity-induced term, equal to Mgl. This 
immediately yields <% e = (L z - L 3 cos6b)///isin 2 6b = MgllLi, so that taking L 3 = hah ot (as we may in the 
high spinning speed limit), we recover the simple expression (72). 

The second important result that readily follows from Eq. (82) is the exact expression the 
threshold value of the spinning speed for a vertically rotating top (0= 0, L z = L 3 ). Indeed, in the limit 0 
— > 0 this expression may be readily simplified: 



U ef (6) ~ const + 



f L 2 

J 1 A 



Mgl 



e 2 



(6.84) 




This formula shows that if <s>5 = L3//3 (i.e. the angular velocity that was called aha in the approximate 
theory) is higher than the following threshold value, 



(6.85) 



then the coefficient at 6 2 in Eq. (84) is positive, so that U e f has a stable minimum at 60 = 0. On the other 
hand, if a>i is decreased below co^, the fixed point becomes unstable, so that the top falls down. Note 
that if we take I = Ia in condition (73) of the approximate treatment, it acquires a very simple sense: <^ ot 

» (Oth. 

Finally, Eqs. (82) give a natural description of one more phenomenon. If the initial energy is 
larger than t/ e f(6b), angle 0 oscillates between two classical turning points on both sides of the fixed 
point 60. The law and frequency of these oscillations may be found exactly as in Sec. 3.3 - see Eqs. 
(3.27) and (3.28). At ah, » co^, this motion is a fast rotation of the symmetry axis n 3 of the body about 
its average position performing the slow precession. These oscillations are called nutations, but 



Threshold 

angular 

velocity 



14 In that simple case the body continues to rotate about the vertical symmetry axis: 6{t) = 0. Note, however, that 
such motion is stable only if the spinning speed is sufficiently high - see below. 

15 Indeed, the derivative of the fraction l/2/yism 2 #, taken at the point cosd? = LJL3, is multiplied by the nominator, 
(L z - L 3 cos#) 2 , which at this point vanishes. 
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physically they are absolutely similar to the free precession that was analyzed in the previous section, 
and the order of magnitude of their frequency is still given by Eq. (59). 

It may be proved that small energy dissipation (not taken into account in our analysis) leads first 
to a decay of nutations, then to a slower drift of the precession angle 6b to zero and, finally, to a gradual 
decay of the spinning speed ah, until it reaches the threshold (85) and the top falls down. 



6.6. Non-inertial reference frames 

Before moving on to the next chapter, let us use the results of our discussion of rotation 
kinematics in Sec. 1 to complete the analysis of transfer between two reference frames, started in the 
introductory Chapter 1 - see Fig. 1.2. Indeed, the differentiation rule described by Eq. (8) and derived 
for an arbitrary vector A enables us to relate not only radius-vectors, but also the velocities and 
accelerations of a particle as measured in two reference frames: the "lab" frame O' (which will be later 
assumed inertial) and the "moving" (possibly rotating) frame O - see Fig. 12. 




Fig. 6.12. General case of transfer 
between two reference frames. 



As this picture shows, even if frame O rotates relative to the lab frame, the radius-vectors are 
still related, at any moment of time, by the simple Eq. (1.7). In the notation of Fig. 12 it reads 



l lab 



r 0 in lab + r 



in lab 



(6.86) 



However, as was discussed in Sec. 1, for velocities the general addition rule is already more complex. In 
order to find it, let us differentiate Eq. (86) over time: 



dt 



in lab 



dt r ° Ub + dt 



in lab * 



(6.87) 



The left-hand part of this relation is evidently particle's velocity as measured in the lab frame, and the 
first term in the right-hand part of Eq. (87) is the velocity of point O, as measured in the same frame. 
The last term is more complex: we need to differentiate vector r that connects point O with the particle 
(Fig. 12), considering how its evolution looks from the lab frame. Due to the possible mutual rotation of 
frames O and O', that term may not be zero even if the particle does not move relative to frame O. 

Fortunately, we have already derived the general Eq. (8) to analyze situations exactly like this 
one. Taking A = r, we may apply it to the last term of Eq. (87), to get 



Transformation 
of velocity 



— V 

in lab O in lab 



+ (v + wxr), 



(6.88) 
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where co is the instantaneous angular velocity of an imaginary rigid body connected to the moving 
reference frame (or we may say, of the frame as such), an v is drldt, as measured in the moving frame O, 
(Here and later in this section, all vectors without indices imply their observation from the moving 
frame.) Relation (88), on one hand, is a natural generalization of Eq. (10) for v # 0; on the other hand, if 
(0 = 0, it is reduced to simple Eq. (1.8) for the translational motion of frame O. 

Now, in order to calculate acceleration, me may just repeat the trick: differentiate Eq. (88) over 
time, and then use Eq. (8) again, now for vector A = v + coxr. The result is 

d 



a| m iab s a 0 | inlab +— (v + «xr) + ox(v + o>xr). 
dt 



Carrying out the differentiation in the second term, we finally get the goal equation, 



a L i.b =a G inlab +a + c>xr + 2(ox v + (ox(oxr), 



(6.89) 



(6.90) 



Transformation 
of 

acceleration 



where a is particle's acceleration, as measured in the moving frame. Evidently, Eq. (90) is a natural 
generalization of the simple Eq. (1.9) to the rotating frame case. 



Now let the lab frame O' be inertial; then the 2 nd Newton law for a particle of mass m is 

F, (6.91) 



ma 



in lab 



where F is the vector sum of all forces action on the particle. This is simple and clear; however, in many 
cases it is much more convenient to work in a non-inertial reference frames. For example, describing 
most phenomena on Earth's surface, in is rather inconvenient to use a reference frame resting on the Sun 
(or in the galactic center, etc.). In order to understand what we should pay for the convenience of using 
the moving frame, we may combine Eqs. (90) and (91) to write 



ma 



ma 



O in lab 



mto x (w x r) - 2m(0 x v - m(o x r. 



(6.92) 



This result may be interpreted in the following way: if we want to use the 2 nd Newton law's analog in a 
non-inertial reference frame, we have to add, to the real net force F acting on a particle, four pseudo- 
force terms, called inertial forces, all proportional to particle's mass. Let us analyze them, while always 
remembering that these are just mathematical terms, not real forces. (In particular, it would be futile to 
seek for the 3 rd Newton law's counterpart for an inertial force.) 

The first term, -mao\m lab, is the only one not related to rotation, and is well known from the 
undergraduate mechanics. (Let me hope the reader remembers all these weight-in-the-moving-elevator 
problems.) Despite its simplicity, this term has subtle and interesting consequences. As an example, let 
us consider a planet, such as our Earth, orbiting a star and also rotating about its own axis - see Fig. 13. 



2 na Newton 
law in non- 
inertial 
reference 
frame 



direction toward the star 




F, =Ma 0 



polar axis 
in "summer 



Ma 0 = -F g 



equator 



polar axis 
in "winter" 



Fig. 6.13. Axial precession of 
a planet (with the equatorial 
bulge and the force line offset 
strongly exaggerated). 
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The bulk-distributed gravity forces, acting on a planet from its star, are not quite uniform 
(because they obey the 1/r 2 gravity law), and hence are equivalent to a single force applied to a point A 
slightly offset from the planet's center of mass O toward the star. For a spherically-symmetric planet, 
points O and A would be exactly aligned with the direction toward the star. However, real planets are 
not absolutely rigid, so that, due to the centrifugal "force" (to be discussed shortly), their rotation about 
their own axis makes them slightly elliptic - see Fig. 13. (For our Earth, this equatorial bulge is about 
10 km in each direction.) As a result, the net gravity force does create a small torque relative to the 
center of mass O. On the other hand, repeating all the arguments of this section for a body (rather than a 
point), we may see that, in the reference frame moving with the planet, the inertial "force" -Mao (which 
is of course equal to the total gravity force and directed from the star) is applied exactly to the center of 
mass and does not create a torque. As a result, this pair of forces creates a torque x perpendicular to both 
the direction toward the star and the vector connecting points O and A. (In Fig. 13, the torque vector is 
perpendicular to the plane of drawing). If angle 8 between the planet's "polar" axis of rotation and the 
direction towards the star was fixed, then, as we have seen in the previous section, this torque would 
induce a slow axis precession about that direction. However, as a result of orbital motion, angle 8 
oscillates in time much faster (once a year) between values (nil + s) and (nil - s), where s is the axis 
tilt, i.e. angle between the polar axis (direction of vectors L and a> ro t) an d the normal to the ecliptic plane 
of the planet's orbit. (For the Earth, s» 23.4°.) A straightforward averaging over these fast oscillations 16 
shows that the torque leads to the polar axis precession about the axis perpendicular to the ecliptic 
plane, keeping angle s constant. For the Earth, the period, r pre = 2;r/<% e , of this precession of the 
equinoxes (or "precession of the equator"), corrected to the substantial effect of Moon's gravity, is close 
to 26,000 years. 

Centrifugal Returning to Eq. (92), the direction of the second term of its right-hand part, F c = -mcox(a)xr), 

called the centrifugal force, is always perpendicular to, and directed out of the instantaneous rotation 
axis - see Fig. 14. Indeed, vector coxr is perpendicular to both co and r (in Fig. 14, normal to the picture 
plane and directed from the reader) and has magnitude corsm0 = cop, where p is the distance of the 
particle from the rotation axis. Hence the outer vector product, with the account of the minus sign, is 
normal to the rotation axis co, directed out from the axis, and equal to co 2 rsind = co 1 p. The "centrifugal 
force" is of course just the result of the fact that the centripetal acceleration co p, explicit in the inertial 
reference frame, disappears in the rotating frame. For a typical location of the Earth (p ~ Re ~ 6xl0 6 m), 
with its angular velocity coe « 10" 4 s" 1 , the acceleration is rather considerable, of the order of 3 cm/s 2 , i.e. 
-0.003 g, and is responsible, in particular, for the largest part of the equatorial bulge mentioned above. 




ma) x (co x r) 



Fig. 6.14. Centrifugal "force". 



16 Details of this calculation may be found, e.g., in Sec. 5.8 of the textbook by H. Goldstein, C. Poole, and J. 
Safko, Classical Mechanics, 3 rd ed., Addison Wesley, 2002. 
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As an example of using the centrifugal "force" concept, let us return again to our "testbed" 
problem on the bead sliding along a rotating ring - see Figs. 1.5 and 2.1. In the non-inertial reference 
frame attached to the ring, we have to add, to real forces mg and N acting on the bead, the horizontal 
centrifugal "force" 17 directed out of the rotation axis, with magnitude ma} p. In the notations of Fig. 2.1, 

2 2 

its component tangential to the ring equals mco pcos0 = mco RsmOcosO , and hence the Cartesian 
component of Eq. (92) along this direction is 

ma = -mg sin 0 + ma> 2 R sin 0 cos 0 . (6.93) 

With a = R0 , this gives us the equation of motion equivalent to Eq. (2.25), which had been derived in 
Sec. 2.2 (in the inertial frame) using the Lagrangian formalism. 

The third term in the right-hand part of Eq. (92) is the so-called Coriolis force } % which exists 
only if the particle moves in the rotating reference frame. Its physical sense may be understood by 
considering a projectile fired horizontally, say from the North Pole. From the point of view of the Earth- 
based observer, it will a subject of an additional Coriolis force F c = -2mcoxv, directed westward, with force 
magnitude Imco^v, where v is the main, southward component of the velocity. This force would cause 
the westward acceleration a = Icoev, and the resulting eastward deviation growing with time as d = at 2 12 
= a>EVt 2 - see Fig. 15. (This formula is exact only if d is much smaller than the distance r = vt passed by 
the projectile.) On the other hand, from the point of view of the inertial-frame observer, the projectile 
trajectory in the horizontal plane is a straight line, but during the flight time t, the Earth surface slips 
eastward from under the trajectory by distance d = rq>= {vt){coEt) = co^yf 2 where cp = co^t is the azimuthal 
angle of the Earth rotation during the flight). Thus, both approaches give the same result. 



Fig. 6.15. Trajectory of a projectile fired 
horizontally from the North Pole, from the 
point of view of an Earth-bound observer 
looking down. Circles show parallels, 
straight lines mark meridians. 



Hence, the Coriolis "force" is just a fancy (but frequently very convenient) way of description of 
a purely geometric effect pertinent to rotation, from the point of view of the observer participating in it. 
This force is responsible, in particular, for the higher right banks of rivers in the Northern hemisphere, 
regardless of the direction of their flow - see Fig. 16. Despite the smallness of the Coriolis force (for a 
typical velocity of the water in a river, v ~ 1 m/s, it is equivalent to acceleration ac ~ 10" cm/s 2 ~ 10" 5 
g), its multi-century effects may be rather prominent. 19 




17 For this problem, all other inertial "forces", besides the Coriolis force (see below) vanish, while the latter force 
is directed perpendicular to the ring and does not affect the bead's motion along it. 

18 Named after G.-G. Coriolis (1792-1843), who is also credited for the first unambiguous definitions of 
mechanical work and kinetic energy. 

19 The same force causes also the counter-clockwise circulation inside our infamous "Nor'easter" storms, in 
which velocity v is caused by lower atmospheric pressure in the middle of the cyclone and directed toward it. 
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Fig. 6.16. Coriolis "forces" due to 
Earth's rotation, in the Northern 
hemisphere. 



The last, fourth term of Eq. (92), - me) x r , exists only when the rotation frequency changes in 
time, and may be interpreted as a local-position-specific addition to the first term. 

Equation (92), derived above from the Newton equation (91), may be alternatively obtained from 
the Lagrangian approach, which also gives some imortant insights on energy at rotation. Let us use Eq. 
(88) to present the kinetic energy of the particle in an inertia! frame in terms of v and r measured in a 
rotating frame: 

r = y kU+(v + (Oxr)] 2 , (6.94) 

and use this expression to calculate the Lagrangian function. For the relatively simple case of particle 
motion in the field of potential forces, measured from a reference frame that performs pure rotation (so 
that Vo|iniab = 0) with a constant angular velocity co, the result is 

L = T-U=jv 2 +mv(wxr) + y((axr) 2 -U = ™ v 2 + my ■ (go x r) -U ef , (6.95) 
where the effective potential energy, 

U ef S [/-|(o)xr) 2 , (6.96) 

is just the sum of the real potential energy U of the particle and the so-called centrifugal potential 
energy associated with the centrifugal "force": 



F c = -mo) x (o) x r) = -V 



m 



(ox r) 2 

2 V ; 



(6.97) 



Of course, the Lagrangian equations of motion derived from Eq. (95), considering the Cartesian 
components of r and v as generalized coordinates and velocities, coincide with Eq. (92) (with a 0 |i n i a b = 
d> = 0, and F = -VtT), but it is very informative to have a look at a by-product of this derivation, the 
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generalized momentum corresponding to particle's coordinate r as measured in the rotating reference 
frame, 20 



dL , s 
p = — = m(v + to x r J . 
By 



(6.98) 



Canonical 
momentum 
at rotation 



According to Eq. (88), with v 0 |in lab = 0, the expression in parentheses is just my\ m i ab . However, from the 
point of view of the moving frame, i.e. not knowing about the physical sense of vector p = mv|i n tab, we 
would have a reason to speak about two different momenta of the same particle, the so-called kinetic 
momentum p = my and the canonical momentum p = p + mcoxr. 21 

Now let us calculate the Hamiltonian function H and energy E as functions of the same moving- 
frame variables: 



H = 2^—v, 



j=i Qv j 1 



L = p- v- L = mv-(v + o)xr)- 



m 



v 2 + m\ • (cox r) - £/ 



ef 



mv 



+ U et , (6.99) 



E = T + U = —v 2 +mv-(toxr) + y(toxr) 2 + U = —v 2 +U ef + my ■ (co x r) + m(co x r) 2 . (6.100) 

These expressions clearly show that E and H are not equal. In hindsight, this is not surprising, because 
the kinetic energy (94), expressed in the moving-frame variables, includes a term linear in v, and hence 
is not a quadratic-homogeneous function of this generalized velocity. The difference of these functions 
may be presented as 

E-H = my • (co x r) + m(co x r) 2 = m(v + co x r)- (co x r) = mv| inlab ■ (toxr) . (6.101) 

Now using the operand rotation rule again, we may transform this expression into a even simpler form: 22 

(6.102) 



E-H = co-(rxmvL lab )=co-(rxp) = co-LL 



lab 



Eand H 
at rotation 



Let us evaluate this difference for our testbed problem - see Fig. 2.1. In this case, vector co is 
aligned with axis z, so that of all Cartesian components of vector L, only component L z is important for 
the scalar product (102). This component evidently equals I z co = mp 2 a>= ma>R 2 sm 0, so that 

E-H = mco 2 R 2 sin 2 6 , (6.103) 

i.e. the same result that follows from the direct subtraction of Eqs. (2.40) and (2.41). 

The last form of Eq. (99) shows that in the rotating frame, the Hamiltonian function of a particle 
has a very simple physical sense. It is conserved, and hence may serve as an integral of motion, in many 
important situations when L, and hence E, are not - our testbed problem is again a very good example. 



20 dLldy is just a shorthand for a vector with Cartesian components dL/dvj. In a different language, this is the 
gradient of L in the velocity space. 

21 A very similar situation arises at the motion of a particle with electric charge q in magnetic field B. In that case 
the role of the additional term p p = mcoxr is played by product qA, where A is the vector-potential of the field 
(B = VxA) - see, e.g., EM Sec. 9.7, and in particular Eqs. (9.183) and (9.192). 

22 Note that by definition (1.36), angular momenta L of particles merely add up. As a result, Eq. (102) is valid for 
an arbitrary system of particles. 
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6.1. Exercise problems 

6.1 . Calculate the principal moments of inertia for the following rigid bodies (see Fig. below): 




a a 



(i) an equilateral triangle made of thin rods with a uniform linear mass density ju, 

(ii) a thin plate in the shape of an equilateral triangle, with a uniform areal mass density cr, and 

(iii) a tetrahedral pyramid made of a heavy material with a uniform bulk mass density p . 

Assuming that the total mass of the three objects is the same, compare the results and give an 
interpretation of their difference. 

6.2 . Prove that Eqs. (34)-(36) are valid for rotation about a fixed axis, even if it does not pass 
through the center of mass, if all distances p z are measured from that axis. 



6.3 . A uniform, round disk of radius R can rotate, without friction, in the 
vertical plane, about a fixed point A at disk's edge - see Fig. on the right. Find 
the eigenfrequency of small oscillations of the disk near its equilibrium position 
in a uniform gravity field. 




6.4 . A thin uniform bar of mass M and length / is hung on a light thread of 
length /' (like a "chime" bell - see Fig. on the right). Find: 

(i) the equations of motion of the system (within the plane of drawing); 

(ii) the eigenfrequencies of small oscillations near the equilibrium; 

(iii) the distribution coefficients for each oscillation mode. 

Sketch the oscillation modes for the particular case / = /'. 



A 
V 



V 



6.5 . A solid, uniform, round cylinder of mass M can roll, 
without slipping, over a concave, round cylindrical surface of a block 
of mass M', in a uniform gravity field - see Fig. on the right. The 
block can slide without friction on a horizontal surface. Using the 
Lagrangian formalism, 











\M J/ 


AT 
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(i) find the frequency of small oscillations of the system near the equilibrium, and 

(ii) sketch the oscillation mode for the particular case M' = M,R' = 2R. 

6.6 . For the "sliding ladder" problem started in Sec. 4 (see Fig. 7), find the critical value a c of 
angle a at which the ladder loses contact with the vertical wall, assuming that it starts sliding from the 
vertical position with a negligible initial velocity. 

6.7 . A small body is dropped down to the surface of Earth from height h « Re, without initial 
velocity. Calculate the magnitude and direction of its deviation from the vertical, due to the Earth 
rotation. Estimate the effect's magnitude for a body dropped from the Empire State Building. 

6.8 . Use Eq. (94) to derive the generalized momentum and the Lagrange equation of motion of a 
particle, considering L a function of r and v as measured in a non-inertial but non-rotating reference 
frame. 
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Chapter 7. Deformations and Elasticity 

The objective of this chapter is a brief discussion of small deformations of 3D continuous media, with a 
focus on elastic properties of solids. The reader will see that deformation of solids is nontrivial even in 
the absence of motion, so that several key problems of statics will need to be discussed before 
proceeding to such dynamic phenomena as elastic waves in infinite media and thin rods. 

7.1. Strain 

Rigid bodies discussed in the previous chapter are just a particular case of continuous media. As 
has already been mentioned, these are systems of particles so close to each other that the system 
discreteness may be neglected, so that the particle displacement q may be considered as a continuous 
function of space and time. The subject of this chapter is small deviations from the rigid-body 
approximation discussed in Chapter 6, i.e. small deformations. The deformation smallness allows one to 
consider the displacement vector q as a function of the initial (pre-deformation) position of the particle 
r, and time t — just as was done in the Sees. 5.3-5.5 for ID waves. 

The first task of the deformation theory is to exclude from consideration the types of motion 
considered in Chapter 6, namely the translation and rotation unrelated to deformations. This means, first 
of all, the variables describing deformations should not depend on the part of displacement q that does 
not depend on position r (i.e. is common for the whole media), because that part corresponds to a 
translational shift rather than to a deformation (Fig. la). Moreover, even certain non-uniform 
displacements do not contribute to deformation. For example, Eq. (6.7) (with dr replaced with dq to 
comply with our current notation) shows that a small displacement of the type 

<iq| ro tation = <i(pxr, (7.1) 

where d<p = cadt is an infinitesimal vector common for the whole continuum, corresponds to its rotation 
about the direction of that vector, and has nothing to do with the body deformation (Fig. lb). 




This is why in order to develop an adequate quantitative characterization of deformation, we 
should start with finding suitable appropriate functions of the spatial distribution of displacements, q(r), 
that exist only due to deformations. One of such measures is the change of distance dl = \dr\ between 
two close points: 
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(diy 



| after deformation " (dl) I before defonnation ~ ^ (d r j + dq ) ^ (^ r , ) 5 

7=1 7=1 



(7.2) 



where Jg 7 is the / h Cartesian component of the difference dq between the displacements q of the two 
points. If the deformation is small in the sense I dq I « \dr\ = dl, we may keep in Eq. (2) only the 
terms proportional to the first power of the infinitesimal vector dq: 

3 r 1 3 

(dl) 2 \ after deformation " (dl) 2 | before deformatIon = £ [id^dq j + (dq jf \ ~ 2^ dr } dq j . (7.3) 

7=1 7=1 

Since qj is a function of 3 independent scalar arguments rj, its differential may be presented as 

■ /=i <>;■ 

Coefficients dq/drj- may be considered as elements of a tensor 1 providing a linear relation between 
vectors dr and dq. Plugging Eq. (4) into Eq. (2), we get 



(dl) 2 



after deformation 



-(diy 



before deformation 



- 2 — dr I 



7,7~=1 5 0' 



7 7 



(7.5) 



A convenience of tensor dqj/drj- for characterizing deformations is that it automatically excludes 
the translation displacement (Fig. la) that is independent of rj. Its drawback is that its particular 
components are still affected by the rotation of the body (though the sum (5) is not). Indeed, according 
to the vector product definition, Eq. (1) may be presented in Cartesian coordinates as 



dq, | rotation = (^/>> " *q> j^f )? W 



(7.6) 



where ejjy is the Levi-Civita symbol 2 equal to (+1) if all indices \ and j" are different and run in any 
"right" order - {1,2, 3}, etc., and (-1) otherwise, so that for any order of non-equal indices, Sjjy = -Sjjj-. 
Differentiating Eq. (6) over a particular Cartesian coordinate of vector r, and taking into account that 
this partial differentiation (d) is independent of (and hence may be swapped with) the differentiation (d) 
over the rotation angle q>, we get the amounts, 



r dq^ 
\ dr r i ■ 

v J J rotation 



= -s Jjr d( Pj „ 



\ dr i i ■ 

v J J rotation 



c. r dip,.-c nT d<p n 



(7.7) 



which may differ from 0. However, notice that the sum of these two differentials equals zero for any d<p, 
which is possible only if 



r dq r dq^ 
■ + ■ 



. dr. dr. 

V J J y rotation 



= 0, for j*f, 



(7.8) 



1 Since both dq and dr are legitimate physical vectors (whose Cartesian components are properly transformed as 
the transfer between reference frames), the 3x3 matrix with elements dq/dr^ is indeed a legitimate physical tensor 
- see the discussion in Sec. 6.2. 

2 See, e.g., MA Eq. (13.2). 
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so that the full sum (5), that includes 3 such partial sums, is not affected by rotation - as we already 
know. This is why it is convenient to rewrite Eq. (5) in a mathematically equivalent form 



(dlf 



after deformation 



-{dlf 



(before deformation 



(7.9a) 



where % are the elements of the so-called symmetrized strain tensor defined as 



Strain 
tensor 




(7.9b) 



(Note that this modification does not affect the diagonal elements: Sjj = dqj/drj). The advantage of 
symmetrized tensor (9b) over the initial tensor dqj/drj- is that according to Eq. (8), at pure rotation all 
elements of the symmetrized strain tensor vanish. 

Now let us discuss the physical meaning of this tensor. At was already mentioned in Sec. 6.2, 
any symmetric tensor may be diagonalized by an appropriate selection of the reference frame axes. In 
such principal axes, sjf- = SjjSjj; so that Eq. (4) takes the simple form 



dq ■ 

dqj =— — dr t = s ii dr l 



dr. 



j jj j 



(7.10) 



We may use this expression to calculate the change of each side of an infinitesimal cuboid 
(parallelepiped) with sides dqj parallel to the principal axes: 



dr j | after deformation dr j \ before deformation — d< i j S -dr ■ 

and of cuboid's volume dV= dr\dridr-i\ 



(7.11) 



dV 



after deformation 



■dV 



3 3 

beforedeformation =YW r J + S B dr j ') ~ Yl dl "j = dV 
7=1 7=1 



n( i+ ^)-i 



7=1 



(7.12) 



Since all our analysis is only valid in the linear approximation in small sjj; Eq. (12) is reduced to 



dV\ 



after deformation 



dV 



before deformation 



-dVj^Sjj =dVTr(s), 

7=1 



(7.13) 



where Tr (trace) 3 of any matrix (in particular, tensor) is the sum of its diagonal elements; in our case 4 



Tr(s) = 2>7 



(7.14) 



7=1 



So, the diagonal components of the tensor characterize medium's compression/extension; then 
what is the meaning of the off-diagonal components of the tensor? It may be illustrated on the simplest 
example of a purely shear deformation, shown in Fig. 2 (the geometry is assumed to be uniform along 
axis z). In this case, all displacements (assumed small) have just one Cartesian component, in Fig. 2 



3 The traditional European notation for Tr is Sp (from German Spur meaning "trace" or "track"). 

4 Actually, the tensor theory shows that the trace does not depend on the particular choice of the coordinate axes. 
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along axis x: q = n x ay (with a « 1), so that the only nonvanishing component of the initial strain tensor 
dqjldrj' is dqjdy = a, and the symmetrized tensor (9b) is 



s = 



0 

all 
0 



all 
0 
0 



0^ 

0 

0 



(7.15) 



Evidently, the change (13) of volume vanishes in this case. Thus, off-diagonal elements of tensor s 
characterize shear deformations. 




To conclude this section, let me note that Eq. (9) is only valid in Cartesian coordinates. For the 
solution of some important problems, especially those with a spherical or axial symmetry, it is 
frequently convenient to express six different components of the symmetric strain tensor via three 
components of the displacement vector q in either spherical or cylindrical coordinates. A 
straightforward differentiation, using the definition of such coordinates, 5 yields, in particular, the 
following formulas for the diagonal elements of the tensor in the local mutually orthogonal coordinates 
that are directed along unit vectors - either {n r , n ft n^} or {n^, n p , n z } - at the given point: 

(i) in the spherical coordinates: 

s rr =^, S „ = * + I^-, ^ + ^^l + ^_ d _^. (7 . 16) 

dr r r 36 r r sin# rsm.6 dcp 

(ii) in the cylindrical coordinates: 

s =^ s =^L + l< s = (7 17) 

These expressions, that will be used below for solution of some problems for symmetrical 
geometries, may be a bit counter- intuitive. Indeed, Eq. (16) shows that even for a purely radial, 
spherically-symmetric deformation, q = n r ^(r), diagonal angular components of strain do not vanish: see 
= s^ = qlr. (According to Eq. (17), in cylindrical coordinates, the same effect is exhibited by the only 
angular component of the tensor.) Note, however, that these relations describe a very simple geometric 
effect: the change of the lateral distance rdy « r between two close points with the same distance r 
from a central point, at a small change of r that keeps the angle dy between their radius-vectors r 
constant. 



5 See, e.g., MA Eqs. (10.1) and (10.7). 
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7.2. Stress 



Stress 
tensor 



Now let us discuss the forces that cause deformations. Internal forces acting inside (i.e. between 
arbitrarily defined parts of) a continuous media may be also characterized by a tensor. This stress 
tensor, 6 with elements q#>, relates components of the elementary vector dF of the force acting on an 
elementary area dA of an (possibly, imaginary) interface between two parts of a continuous media with 
elementary vector dA = ndA normal to the area (Fig. 3): 

(7.18) 

The usual sign convention here is to take the outer normal dn, i.e. to direct dA out of "our" part of the 
continuum, i.e. the part on which the calculated force dF is exerted. 





In some cases the stress tensor's structure is very simple. For example, as will be discussed in 
detail in the next chapter, static or frictionless fluids may only provide a force normal to any surface and 
usually directed toward "our" part of the body, so that 



Pressure 



dF = -PdA, i.e. a ff = -PS Jf 



(7.19) 



where scalar P (in most cases positive) is called pressure, and generally depends on both the spatial 
position and time. This type of stress, with P > 0, is frequently called the hydrostatic compression - even 
if it takes place in solids. 

However, in the general case the stress tensor also has off-diagonal terms, which characterize 
shear stress. For example, if the shear strain shown in Fig. 2 is caused by a pair of forces ±F, they create 
internal forces F x n x , with F x > 0 if we speak about the force acting upon a part of the sample below the 
imaginary horizontal interface we are discussing. In order to avoid horizontal acceleration of each 
horizontal slice of the sample, the forces should not depend on y, i.e. F x = const = F. Superficially, it 
may look that this is the only nonvanishing component of the stress tensor is dF x ldA y = FIA = const, so 
that tensor is asymmetric, in contrast to the strain tensor (15) of the same system. Note, however, that 
the pair of forces ±F creates not only the shear stress, but also a nonvanishing rotating torque t = -Fhn z 
= -{dF x ldA y )Ahn z = -{dF x ldA y )Vn z , where V= Ah is sample's volume. So, if we want to perform a static 
stress experiment, i.e. avoid sample's rotation, we need to apply some other forces, e.g., a pair of 
vertical forces creating an equal and opposite torque x ' = (dF y /dA x )Vn z , implying that dF y ldA x = dF x ldA y 
= FIA. As a result, the stress tensor becomes symmetric, and similar in structure to the symmetrized 
strain tensor (15): 



6 It is frequently called the Cauchy stress tensor, partly to honor A.-L. Cauchy (1789-1857) who introduced it, 
and partly to distinguish it from and other possible definitions of the stress tensor, including the 1 st and 2 nd Piola- 
Kirchhoff tensors. (For the infinitesimal deformations discussed in this course, all these notions coincide.) 
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a = 



0 FJA 0^ 



FJA 0 



0 



0 



(7.20) 



In many situations, the body may be stressed not only by forces applied to their surfaces, but also 
by some volume-distributed {bulk) forces dF = fdV, whose certain effective bulk density f. (The most 
evident example of such forces is gravity. If its field is uniform as described by Eq. (16), then f = pg, 
where p is the mass density.) Let us derive the key formula describing the correct summation of the 
surface and bulk forces. For that, consider again an infinitesimal cuboid with sides drj parallel to the 
corresponding coordinates axes (Fig. 4) - now not necessarily the principal axes of the stress tensor. 



-d¥ ir) 
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Fig. 7.4. Deriving Eq. (23). 



If elements q#-of the tensor do not depend on position, the force dF^ acting on y"-th face of the 
cuboid is exactly balanced by the equal and opposite force acting on the opposite face, because vectors 
dA^ * of these faces are equal and opposite. However, if Ojy is a function of r, then the net force d(d¥^ 



does not vanish. Using the expression for to the j ' th contribution to sum (18), in the first order in dr, the 



f h components of this vector is 



, .,, / \ da,,, da .., 

dUlF/ ] ) = d(a F dA f ) = dr .dA , = — ±- dV, 



dr., 



dry 



(7.21) 



where cuboid's volume dV = drydAy does not depend on j'. The addition these force components for all 
three pairs of cuboid faces, i.e. the summation of Eqs. (21) for all 3 values of the upper index j', yields 
the following relation for the j component of the net force exerted on the cuboid: 



d{dF j ) = j^d{dF^) = j^-^V 

/=i /=i or f 



(7.22) 



Since any volume may be broken into such infinitesimal cuboids, Eq. (22) shows that the space-varying 
stress is equivalent to a volume-distributed force dF e f = UfdV, whose effective (not real!) bulk density f e f 
has the following Cartesian components 



(/e f ),=t^ 



(7.23) 



so that in the presence of genuinely bulk forces dF = fdV, densities f e f and f just add up. 

Let us use this addition rule to spell out the 2 nd Newton law for a unit volume of a continuous 
medium: 
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d 2 q . . 



Using Eq. (23), the / h Cartesian component of Eq. (24) may be presented as 



Medium 
dynamics 
equation 



(7.24) 



(7.25) 



This is the key equation of medium's dynamics, which will be repeatedly used below. 

For solution of some problems, it is also convenient to have a general expression for work SW of 
the stress forces at a virtual deformation Sq - understood in the same variational sense as the virtual 
displacement Sr in Sec. 2.1. Using the equivalence between the stress forces and the effective bulk 
forces with density f ef , for any volume V of the media we may write 




SW = \f ef - SqdV = Zlif^jSqjdV = X f° Jf Sqjd'r. 

V 7=1 V ],j -1 V 



(7.28) 



Let us take this integral by parts for a volume so large that deformations Sqj on its surface are negligible. 
Then, swapping the operations of variations and spatial differentiation (just like it was done with the 
time derivative in Sec. 2.1), we get 



J JJ Fir 

Assuming that tensor ojj- is symmetric, we may rewrite this expression as 



(7.29) 



SW 



L 7,7'=1 V 



o ,„S 1 — L + a : ,S 



dqj 



8r f 



* 8r fJ 



d'r. 



(7.30) 



Now, swapping indices j and j ' in the second expression, we finally get 



SW 



hi - 1 v 



f dq j dq f 

— -<r .., H —a 

dr., 11 cr, 11 



V J 



s 

d'r = -Y J \o jf Ss jr d'r, 



(7.31) 



7.7 -1 V 



Work of 
stress 
forces 



where are the components of strain tensor (9b). It is natural to rewrite this important formula as 

(7.32) 



5W = ^Sw(r)d 3 r, where Sw(r) = -^a^ds jj, 



and interpret the locally-defined scalar function c5vv(r) as the work of stress forces per unit volume, due 
to the small variation of the deformation. 



7.3. Hooke's law 

In order to form a complete system of equations describing media dynamics, one needs to 
complement Eq. (25) with an appropriate material equation describing the relation between the stress 
tensor Ojy and the deformation q described (in the small deformation limit) by the strain tensor sjj: This 
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relation depends on the medium, and generally may be rather complex. Even leaving alone various 
anisotropic solids (e.g., crystals) and macroscopically-inhomogeneous materials (like ceramics or sand), 
strain typically depends not only on the current value of stress (possibly in a nonlinear way), but also on 
the previous history of stress application. Indeed, if strain exceeds a certain plasticity threshold, atoms 
(or nanocrystals) may slip to their new positions and never come back even if the strain is reduced. As a 
result, deformations become irreversible - see Fig. 5. 
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Fig. 7.5. Fypical relation between stress 
and strain in solids (schematically). 



Only below the thresholds of nonlinearity and plasticity (which are typically close to each other), 
strain is nearly proportional to stress, i.e. obeys the famous Hooke's law. 1 However, even in this elastic 
range the law is not quite simple, and even for an isotropic medium is described not by one but by two 
constants, called elastic moduli. The reason for that is that most elastic materials resist the strain 
accompanied by the volume change (say, the hydrostatic compression) differently from how they resist 
the shear deformation. In order to describe this difference, let us first present the symmetrized strain 
tensor (9b) in the mathematically equivalent form 



Tr(s) 



+ 



Tr(s) 



(7.33) 



According to Eq. (13), the traceless tensor in the first parentheses of Eq. (33) does not give any 
contribution to the volume change, e.g., may be used to characterize purely shear deformation, while the 
second one describes the hydrostatic compression alone. Hence we may expect that the stress tensor 
may be presented (again, in the elastic deformation range only!) as 



<T F =2fi 


f l A 
V -> J 


+ 3K 


V^> J 


5 



(7.34) 



Hooke's 
law 



where K and ju are some constants. 8 Indeed, experiments show that Hooke's law in this form is followed, 
at small strain, by all isotropic elastic materials. In accordance with the above discussion, constant ju (in 
some texts, denoted as G) is called the shear modulus, while constant K (sometimes called B), the bulk 
modulus. Two columns of Table 1 below show the approximate values of these moduli for typical 
representatives of several major classes of materials. 9 



7 Named after R. Hooke (1635-1703) who was first to describe the law in its simplest, ID version. 

8 The inclusion of coefficients 2 and 3 into Eq. (34) is justified by the simplicity of some of its corollaries - see, 
e.g., Eqs. (38) and (43) below. 

9 Since the strain tensor elements, defined by Eq. (5), are dimensionless, while the strain defined by Eq. (18) has 
the dimensionality of pressure (force by unit area), so do the elastic moduli K and ju. 
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To better appreciate these values, let us first discuss the physical meaning of K and /u, using two 
simple examples of elastic deformation. For that it is convenient first to solve the set of 9 (or rather 6 
different) linear equations (34) for Sjj-. This is easy to do, due to the simple structure of these equations: 
they relate components eg,- and 5^- with the same indices, besides the involvement of the tensor trace. 
This slight complication may be readily overcome by noticing that according to Eq. (34), 



Tr (a) = X = 3K Tr (4 i-e. Tr (s) = ^ Tr (a) . 



(7.35) 



7=1 



Plugging this result into Eq. (34) and solving it for Sjj; we readily get the reciprocal relation, which may 
be presented in a similar form: 



s jf = 



1 



j 



3K 



V- 



(7.36) 



Table 7.1. Elastic moduli, density, and sound velocities of a few representative materials (approximate values) 



Material 


K(G?a) 


ju (GPa) 


E (GPa) 


a 


p(kg/m 3 ) 


v/(m/s) 


v t (m/s) 


Diamond (a) 


600 


450 


1,100 


0.20 


3,500 


1,830 


1,200 


Hardened steel 


170 


75 


200 


0.30 


7,800 


5,870 


3,180 


Water (b) 


2.1 


0 


0 


0.5 


1,000 


1,480 


0 


Air (b) 


0.00010 


0 


0 


0.5 


1.2 


332 


0 



fa) 

Averages over crystallographic directions (-10% anisotropy). 



(b) 



At the so-called ambient conditions (T= 20°C, P = 1 bar = 10" 5 Pa). 



Now let us apply Hooke's law, in the form of Eqs. (34) or (36), to two simple situations in which 
the strain and stress tensors may be found without formulating the exact differential equations of the 
elasticity theory and boundary conditions for them. (That will be the subject of the next section.) The 
first experiment is the hydrostatic compression when the stress tensor is diagonal, and all its diagonal 
components are equal - see Eq. (19). 10 For this case Eq. (36) yields 



s jf=- 



(7.37) 



which means that regardless of the shear modulus, the strain tensor is also diagonal, with all diagonal 
components equal. According to Eqs. (11) and (13), this means that all linear dimensions of the body are 
reduced by a similar fraction, so that its shape is preserved, while the volume is reduced by 



10 It may be proved that such situation may be implemented not only in a fluid with pressure P, but also by 
placing a solid sample of an arbitrary shape into a compressed fluid. 
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This equation clearly shows the physical sense of the bulk modulus K as the reciprocal compressibility. 

As Table 1 shows, the values of K may be dramatically different for various materials, and that 
even for such "soft stuff as water this modulus in actually rather high. For example, even at the bottom 
of the deepest, 10-km ocean well (P « 10 bar « 0.1 GPa), water density increases by just about 5%. As 
a result, in most human-scale experiments, water may be treated as incompressible - a condition that 
will be widely used in the next chapter. Many solids are even much less compressible - see the first two 
rows of Table 1 . 

The most compressible media are gases. For a gas, certain background pressure P is necessary 
just for containing it within certain volume V, so that Eq. (38) is only valid for small increments of 
pressure, AP: 

Z-.-**. (7.39) 
V K 

Moreover, gas compression also depends on thermodynamic conditions. (For most condensed media, the 
temperature effects are very small.) For example, at ambient conditions most gases are reasonably well 
described by the equation of state for the model called the ideal classical gas: 

PV = Nk B T, i.e.P = ^^. (7.40) 

where N is the number of molecules in volume V, and £b ~ 1.38xl0" 23 J/K is the Boltzmann constant. 11 
For a small volume change AFat constant temperature, this equation gives 

Comparing this expression with Eq. (37), we get a remarkably simple result for the isothermal 
compression of gases, 

K\ T=coast =P, (7-42) 

which means in particular that the bulk modulus listed in Table 1 is actually valid, at the ambient 
conditions, for almost any gas. Note, however, that the change of thermodynamic conditions (say, from 
isothermal to adiabatic 12 ) may affect gas' compressibility. 

Now let us consider the second, rather different, fundamental experiment: a pure shear 
deformation shown in Fig. 2. Since the traces of matrices (15) and (20), which describe this experiment, 
are equal to 0, for their off-diagonal elements Eq. (34) gives simply q#- = IfjSjy, so that the deformation 
angle a (see Fig. 2) is just 

1 F 

a = — (7.43) 
ju A 



11 For the derivation and detailed discussion of Eq. (40) see, e.g., SM Sec. 3.1 

12 See, e.g., SM Sec. 1.3. 
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Notice that the angle does not depend on thickness h of the sample, though of course the maximal linear 
deformation q x = ah is proportional to the thickness. Naturally, as Table 1 shows, for all fluids (liquids 
and gases) ju = 0, because they cannot resist static shear stress. 

However, not all experiments, even the apparently simple ones, involve just either K or ju. Let us 
consider stretching a long elastic rod of a small and uniform cross-section of area A - the so-called 
tensile stress experiment shown in Fig. 6. 13 

L 

I < > I 

^ Fig. 7.6. Tensile stress experiment. 

z 



Young's 
modulus 



Though the deformation of the rod near its clamped ends depends on the exact way forces F are applied 
(we will discuss this issue later on), we may expect that over most of its length the tension forces are 
directed virtually along the rod, dF = F z ii z , and hence, with the coordinate choice shown in Fig. 6, a X j = 
Oyj = 0 for all j, including the diagonal elements a xx and a yy . Moreover, due to the open lateral surfaces, 
on which, evidently, dF x = dF y = 0, there cannot be an internal stress force of any direction, acting on 
any elementary internal boundary parallel to these surfaces. This means that o zx = <j ZY = 0. So, of all 
components of the stress tensor only one, a zz , is not equal to zero, and for a uniform sample, a zz = const 
= FIA. For this case, Eq. (36) shows that the strain tensor is also diagonal, but with different diagonal 
elements: 



s„ = 



9K 3/u 



a _ 



s = s 

xx yy 



1 1 

9K 6// 



(7.44) 
(7.45) 



Since the tensile stress is most common in engineering (and physical experiment) practice, both 
combinations of the elastic moduli participating in these two relations have deserved their own names. 
In particular, the constant in Eq. (44) is usually denoted as HE (but in many texts, as 1/7), where E is 
called the Young's modulus: 



111 
E~9K 3/u 



i.e. E = 



9Kjj_ 
2>K + /u' 



(7.46) 



As Fig. 6 shows, in the tensile stress geometry s zz = dqjdz = AL/L, so that the Young's modulus scales 
the linear relation between the relative extension of the rod and the force applied per unit area: 14 



AL 
L 



1 F 

~E~J 



(7.47) 



13 Though the analysis of compression in this situation gives similar results, in practical experiments a strong 
compression may lead to the loss of horizontal stability - the so-called buckling - of the rod. 

14 According to Eq. (47), E may be thought of as the force per unit area, which would double sample's length, if 
only our theory was valid for deformations that large. 
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The third column of Table 1 shows the values of this modulus for two well-known solids: diamond (with 
the highest known value of E of all bulk materials 15 ) and the steel (physically, a solid solution of -10% 
of carbon in iron) used in construction. Again, for fluids the Young's modulus vanishes - as it follows 
from Eq. (46) with ju=0. 

I am confident that the reader of these notes has been familiar with Eq. (44), in the form of Eq. 
(47), from his or her undergraduate studies. However, most probably this cannot be said about its 
counterpart, Eq. (45), which shows that at the tensile stress, rod's cross-section dimension also change. 
This effect is usually characterized by the following dimensionless Poisson 's ratio: 16 



1 

9K 



J_ 
6ju 



I 



1 J_ 
9K 3/u 



1 3K-2ju 

2 3K + {i 



0", 



(7.48) 



Poisson's 
ratio 



According to this formula, for realistic materials with K > 0, ju > 0, values of cr may vary from 
(-1) to i+Vz) , but for all known materials they are between 0 and Vz - see Table 1. The lower limit is 
reached in porous materials like cork whose lateral dimensions almost do not change at the tensile 
stress. Some soft materials like rubber present the opposite case: a « Vz. Since according to Eqs. (13), 
(44) and (45), the volume change is 



IF/. ~ \ 

= s xr +s m +s, z = (1-2(7), 

V yy E A ' 



(7.49) 



such materials virtually do not change their volume at the tensile stress. The ultimate limit of this trend, 
AV/V= 0, is provided by fluids and gases, because their Poisson's ratio cr is exactly equal to Vz. (This 
follows from Eq. (48) with ju = 0.) However, for most practicable construction materials such as steel 
(see Table 1) the change (49) of volume is as high as -40% of that of the length. 

Due to the clear physical sense of coefficients E and a, they are frequently used as a pair of 
independent elastic moduli, instead of K and /u . Solving Eqs. (46) and (48) for K and ju , we get 



K = 



3(1 -2a) 



E 



2(l + o-) 



(7.50) 



Using these formulas, the two (equivalent) formulations of Hooke's law, expressed by Eqs. (34) and 
(36), may be rewritten as 



(7.51a) 




Hooke's 
law with 



(7.51b) EandCT 



The linear relation between the strain and stress tensor allows one to calculate the potential 
energy U of an elastic medium due to its elastic deformation. Indeed, to each infinitesimal part of this 



15 It is probably somewhat higher (up to 2,000 GPa) in such nanostructures as carbon nanotubes and monoatomic 
sheets (graphene), though there is still a substantial uncertainty in experimental values of elastic moduli of these 
structures - see, e.g., C. Lee et al, Science 321, 5887 (2008) and J.-U. Lee et al, Nano Lett. 12, 4444 (2012). 

16 Unfortunately, the dominating tradition is to use for the Poisson's ratio the same letter (o) as for the stress 
tensor components, but they may be always distinguished by the presence or absence of component indices. 
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strain increase, we may apply Eq. (32), with the work 5W of the surface forces equal to -8U. Let us 
slowly increase the deformation from a completely unstrained state (in which we may take U = 0) to a 
certain strained state, in the absence of bulk forces f, keeping the deformation type, i.e. the relation 
between the elements of the stress tensor intact. In this case, all elements of tensor oft- are proportional 
to the same single parameter characterizing the stress (say, the total applied force), and according to 
Hooke's law, all elements of tensor Sjf are proportional to that parameter as well. In this case, 
integration over the variation yields the final value 17 



Elastic 
deformation 
energy 



j 3 

U = ju(r)d 3 r, u{r) = -^a jf s Jf 



(7.52) 



Evidently, w(r) may be interpreted as the volume density of the potential energy of the elastic 
deformation. 



7.4. Equilibrium 

Now we are fully equipped to discuss dynamics of elastic deformations, but let us start with 
statics. The static (equilibrium) state may be described by requiring the right-hand part of Eq. (25) to 
vanish. In order to find the elastic deformation, we need to plug ay- from the Hooke's law (51a), and 
then express elements Sjf via the displacement distribution - see Eq. (9). For a uniform material, the 
result is 18 



+ - 



2(1 + a) yi drf, 2(1 + ct)(1 - 2o) % drfry 



+ fj=0. 



(7.53) 



Taking into account that the first sum in Eq. (53) is just the / h component of V 2 q, while the second sum 



is the j component of V(V-q), we see that all three equations (53) for three Cartesian components (j = 
1 , 2 and 3) of the deformation vector q, may be conveniently merged into one vector equation 



2(l + o-) 



V 2 q + 



2(l + <7)(l-2o-) 



V(V-q) + f = 0. 



(7.54) 



Equation For some applications, it is more convenient to recast this equation to another form, using vector 



of elastic 



equilibrium identity 19 V q =V(V-q) - Vx(Vxq). The result is 



E(l-a) 



(l + <7)(l-2<7) 



V(V-q) 



2(l + o-) 



Vx(Vxq) + f = 0. 



(7.55) 



It is interesting that in problems without volume-distributed forces (f = 0), the Young's modulus 
E cancels! Even more fascinating, in this case the equation may be re- written in a form not involving the 
Poisson's ratio a either. Indeed, acting by operator V on the remaining terms of Eq. (55), we get a 
surprisingly simple equation 



17 For clarity, let me reproduce a similar integration for the ID motion of a particle on a spring. In this case, SU = 
-SW = -Fdx, and if spring's force is elastic, F = -kx, the integration yields U= kk 2 /2 = Fx/2. 

18 As follows from Eqs. (50), the coefficient before the first sum in Eq. (53) is just the shear modulus //, while that 
before the second sum is equal to (K + ju/3). 

19 See, e.g., MA Eq. (11.3). 
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V 2 (V-q) = 0. (7.56) 

A natural question here is how do the elastic moduli affect the deformation distribution if they do 
not participate in the differential equation describing it. The answer is two-fold. If what is fixed at the 
body boundary are deformations, then the moduli are irrelevant, because the deformation distribution 
through the body does not depend on them. On the other hand, if the boundary conditions fix stress (or a 
combination of stress and strain), then the elastic constants creep into the solution via the recalculation 
of these conditions into the strain. 

As a simple but representative example, let us find the deformation distribution in a (generally, 
thick) spherical shell under the effect of different pressures fixed inside and outside it (Fig. 7a). 




Due to the spherical symmetry of the problem, the deformation is obviously spherically- 
symmetric and radial, q = q{r)n r , i.e. is completely described by one scalar function q(r). Since the curl 
of such a radial vector field is zero, 20 Eq. (55) is reduced to 

V(V-q) = 0, (7.57) 

This equation means that the divergence of function q(r) is constant within the shell. In spherical 
coordinates this means 21 

J_A( r O = const (7 58) 

r dr 

Naming this constant 3a (with the numerical factor chosen for later notation convenience), and 
integrating Eq. (58), we get its solution, 

q(r) = ar + \, (7.59) 
r 

that also includes another integration constant b . 

To complete the analysis, we have to determine constants a and b from the boundary conditions. 
According to Eq. (19), 

\-R, r = R,, 

<*„={ (7-60) 
\-P 2 , r = R 2 . 



20 If this is not immediately evident, have a look at MA Eq. (10.1 1) with f =f r (r)n r . 

21 See, e.g., MA Eq. (10.10) with f = q(r)n r 
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In order to relate this stress to strain, let us use Hooke's law, but for that, we first need to calculate the 
strain tensor components for the deformation distribution (59). Using Eqs. (16), we get 

dq ~ b q b 

s rr = — = a-2 — , s gg =s =- = a + — , (7.61) 
dr r r r 

so that Tr (s) = 3a. Plugging these relations into Eq. (51a) for <j rr , we get 



E 

<x„„ 



I ■ rr 





( 








a 


-A 












V 


r 


) 



+ 3a 

\-2a 



(7.62) 



Now plugging this relation into Eqs. (60), we get a system of two linear equations for coefficients a and 
b. Solving this system, we get: 

a=i -2. p lR ;-p 2 R' wp-pmv 

E Rl-Rl IE Rl-Rl 

Formulas (59) and (63) give a complete solution of our problem. It is rich in contents and 
deserves at least some analysis. First of all, note that according to Eq. (50), coefficient (1 - 2a)/E in the 
expression for a is just \/3K, so that the first term in Eq. (59) for deformation is just the hydrostatic 
compression. In particular, the second of Eqs. (63) shows that if Ri = 0, then b = 0. Thus for a solid 
sphere we have only the hydrostatic compression that was discussed in the previous section. Perhaps 
less intuitively, making two pressures equal gives the same result (hydrostatic compression) for arbitrary 
R 2 >Ri. 

However, in the general case b ^ 0 , so that the second term in the deformation distribution (59), 
which describes the shear deformation, 22 is also substantial. In particular, let us consider the important 
thin-shell limit R 2 - R\ = t « R^ 2 = R - see Fig. 7b. In this case, q(Ri) « q(R 2 ) is just the change of the 
shell radius R, for which Eqs. (59) and (63) (with R 2 - Ri 3 * 3R 2 t) give 



b (P l -P 2 )R 



2 2a 1 + cr^ ,„ „ ^R 2 1-tr 



R z 3t 



AR = q(R) * aR + — * ^ ^ + = (i> - P 2 ) . (7.64) 



2E t 2E 



Naively, one could think that at least in this limit the problem could be analyzed by elementary 
means. For example, the total force exerted by the pressure difference (Pi - P 2 ) on the diametrical cross- 
section of the shell (see, e.g., the dashed line in Fig. 7b) is F = nR (Pi - P 2 ), giving the stress, 

directed along shell's walls. One can check that this simple formula may be indeed obtained, in this 
limit, from the strict expressions for <jgg and <J m following from the general treatment carried out 
above. However, if we try now to continue this approach by using the simple relation (47) to find the 
small change Rs zz of sphere's radius, we would arrive at a result with the structure of Eq. (64), but 
without factor (1 - a) < 1 in the nominator. The reason for this error (which may be as significant as 
-30% for typical construction materials - see Table 1) is that Eq. (47), while being valid for thin rods of 
arbitrary cross-section, is invalid for thin broad sheets, and in particular the thin shell in our problem. 



22 Indeed, according to Eq. (50), the material-dependent factor in the second of Eqs. (63) is just 1/4//. 



Chapter 7 



Page 15 of 36 



Essential Graduate Physics 



CM: Classical Mechanics 



Indeed, while at the tensile stress both lateral dimensions of a thin rod may contract freely, in our 
problem all dimensions of the shell are under stress - actually, under much more tangential stress than 
the radial one. 23 



7.5. Rod bending 

The general approach to the static deformation analysis, outlined in the beginning of previous 
section, may be simplified not only for symmetric geometries, but also for the uniform thin structures 
such as thin plates ("membranes" or "sheets") and thin rods. Due to the shortage of time, in this course I 
will demonstrate typical approaches to such systems only on the example of thin rods. (The theory of 
membrane deformation is very much similar.) Besides the tensile stress analyzed in Sec. 3, two other 
major deformations of rods are bending and torsion. Let us start from a "local" analysis of bending 
caused by a pair of equal and opposite external torques x = ±n y r y perpendicular to the rod axis z (Fig. 8), 
assuming that the rod is "quasi-uniform", i.e. that on the scale of this analysis (comparable with linear 
scale a of the cross-section) its material parameters and cross-section A do not change substantially. 



(a) 



x, 










z — - — 




(b) 



Fig. 7.8. This rod bending, in a local reference frame (specific for each cross-section). 

Just as in the tensile stress experiment (Fig. 6), at bending the components of the stress forces 
dF, normal to the rod length, have to equal zero on the surface of the rod. Repeating the arguments made 
for the tensile stress discussion, we arrive at the conclusion that only one diagonal component of the 
tensor (in Fig. 8, a zz ) may differ from zero: 



(7.66) 



However, in contrast to the tensile stress, at pure static bending the net force along the rod has to vanish: 

F z =\cr zz d 2 r = Q, (7.67) 

A 

so that <j zz has to change sign at some point of axis x (in Fig. 8, selected to lay in the plane of the bent 
rod). Thus, the bending deformation may be viewed as a combination of stretching some layers of the 
rod (bottom layers in Fig. 8) with compression of other (top) layers. 

Since it is hard to find more about the stress distribution from these general considerations, let us 
turn over to strain, assuming that the rod's cross-section is virtually constant on the length of the order 
of its cross-section size. From the above presentation of bending as a combination of stretching and 
compression, it evident that the longitudinal deformation q z has to vanish along some neutral line on the 



23 Strictly speaking, this is only true if the pressure difference is not too small, namely, if |Pi - P z \ » P\,2tlR. 
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rod's cross-section - in Fig. 8, represented by the dashed line. 24 Selecting the origin of coordinate x on 
this line, and expanding the relative deformation in the Taylor series in x, due to the cross-section 
smallness, we may limit ourselves to the linear term: 

= (7-68) 
dz R 

Here constant R has the sense of the curvature radius of the bent rod. Indeed, on a small segment dz the 
cross-section turns by a small angle dq> y = - dqjx (Fig. 8b). Using Eq. (68), we get dq> y = dz/R, which is 
the usual definition of the curvature radius R in the differential geometry, for our special choice of the 
coordinate axes. 25 

Expressions for other components of the strain tensor are harder to guess (like at the tensile 
stress, not all of them are equal to zero!), but what we already know about a zz and s zz is already 
sufficient to start formal calculations. Indeed, plugging Eq. (66) into the Hooke's law in the form (51b), 
and comparing the result for s zz with Eq. (68), we find 

(7-69) 

From the same Eq. (51b), we could also find the transverse components of the strain tensor, and see that 
they are related to s zz exactly as at the tensile stress: 

s„=s xy =-(TS B , (7.70) 

and then, integrating these relations along the cross-section of the rod, find the deformation of the cross- 
section shape. More important for us, however, is the calculation of the relation between rod's curvature 
and the net torque acting on a given cross-section (of area A and orientation dA z > 0): 

r = [(rxd¥)=-[x(7d 2 r = — [x 2 d 2 r = ^, (7.71) 
V { R\ R 

AAA 

where I y is a geometric constant defined as 

I y =jx 2 dxdy. (7.72) 

A 

In these expressions, x has to be counted from the neutral line. Let us see where exactly does this 
line pass through rod's cross-section. Plugging result (69) into Eq. (67), we get the condition defining 
the neutral line: 



jxdxdy = 0. (7.73) 



This condition allows a simple interpretation. Imagine a thin sheet of some material, with a constant 
mass density a per unit area, cut in the form of rod's cross-section. If we place a reference frame into its 
center of mass, then, by its definition, 



24 Strictly speaking, that dashed line is the intersection of the neutral surface (the continuous set of such neutral 
lines for all cross-sections of the rod) with the plane of drawing. 

25 Indeed, for (dx/dz) 2 « 1, the general formula MA Eq. (4.3) for curvature (with the appropriate replacements / 
— > x and x— > z) is reduced to l/R = d 2 xldz 2 = d{dxldz)ldz = d(tan<p y )/dz « dcpyldz. 
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cr j" xdxdy 



0. 



(7.74) 



Comparing this condition with Eq. (73), we see that one of neutral lines has to pass through the center of 
mass of the sheet, which may be called the "center of mass of the cross-section". Using the same 
analogy, we see that integral I y (72) may be interpreted as the moment of inertia of the same imaginary 
sheet of material, with cr formally equal to 1, for its rotation about the neutral line - see Eq. (6.24). This 
analogy is so convenient that the integral is usually called the moment of inertia of the cross-section and 
denoted similarly - just as has been done above. So, our basic result (71) may be re -written as 



1 


T y 


R ' 


EI y " 





(7.75) 



Rod 

bending 
curvature 
vs. torque 



This relation is only valid if the deformation is small in the sense R » a. Still, since the 
deviations of the rod from its unstrained shape may accumulate along its length, Eq. (75) may be used 
for calculations of global deviations arbitrary on the scale of a. In order to describe such deformations, 
this equation has to be complemented by conditions of balance of the bending forces and torques. 
Unfortunately, this requires a bit more of differential geometry than I have time for, and I will only 
discuss this procedure for the simplest case of relatively small deviations q = q x of the rod from its initial 
straight shape, which will be used for axis z (Fig. 9a), by some bulk-distributed force f = n/ x (z). (The 
simplest example is uniform gravity field near the Earth's surface, for which/ v = -pg = const.) Note that 
in the forthcoming discussion the reference frame will be global, i.e. common for the whole rod, rather 
than local (pertaining to each cross-section) as in the previous analysis - cf. Fig. 8. 



A x 




q = 0| 



(b) 



F n 



z + dz 



q = 0 

T = 0 



F = 0 
x = 0 



F = n x F 0 



T=0 



Fig. 7.9. Global picture of rod bending: (a) forces acting on a small fragment of a rod and 
(b) two bending problem examples, each with two typical, different boundary conditions. 



First of all, we may write an evident differential equation for the average vertical force F = 
n x F x (z) acting of the part of the rod located to the left of its cross-section located at point z. This 
equation expresses the balance of vertical forces acting on a small fragment dz of the rod (Fig. 9a), 
necessary for the absence of its linear acceleration: F x (z + dz) - F x (z) + f(z)Adz = 0, giving 

dF 

^T = -LA. (7.76) 
dz 

Note that this vertical component of the internal forces has been neglected at our derivation of Eq. (75), 
and hence our final results will be valid only if the ratio FJA is much less than the magnitude of cr zz 
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described by Eq. (69). However, these lateral forces create the very torque t = n y r y that causes the 
bending, and thus have to be taken into account at the analysis of the global picture. This re-calculation 
is expressed by the balance of torque components acting on the same rod fragment of length dz, 
necessary for the absence of its angular acceleration: 

dr v 

-f- = -F x . (7.77) 
dz 

These two equations of dynamics (or rather statics) should be complemented by two geometric 
relations. The first of them is dq> y ldz = \/R, which has already been discussed. We may immediately 
combine it with the basic result (75) of the local analysis, getting: 



d(p y r y 



dz EI y 



(7.78) 



The final equation is the geometric relation evident from Fig. 9a: 

^-9, P.79) 
dz 

which is (as all expressions of our simple analysis) only valid for small bending angles, I <Py I « 1. 

Four differential equations (76)-(79) are sufficient for the full solution of the weak bending 
problem, if complemented by appropriate boundary conditions. Figure 9b shows four most frequently 
met conditions. Let us solve, for example, the problem shown on the top panel of Fig. 9b: bending of a 
rod, clamped in a wall on one end, under its own weight. Considering, for the sake of simplicity, a 
uniform rod, 26 we may integrate equations (70), (72)-(74) one by one, each time using the appropriate 
boundary conditions. To start, Eq. (76), with f x = - pg, yields 

F x = pgAz + const = pgA(z - L), (7.80) 

where the integration constant has been selected to satisfy the right-end boundary condition: F x = 0 at z 
= L. As a sanity check, at the left wall (z = 0), F x = -pgAL = - mg, meaning that the whole weight of the 
rod is exerted on the wall - fine. 



Next, plugging Eq. (80) into Eq. (77) and integrating, we get 



r = -pgA 



(Z —Lz" 
v2 , 



+ const = -pgA 



V r L 2 } 

Lz + — 

2 2 



{z-Lf, (7.81) 



where the integration constant's choice ensures the second right-boundary condition: r y = 0 at z = L. 
Proceeding in the same fashion to Eq. (78), we get 



<p y = - PSA(Z-L) + const = _Pg±[ (z _ L y +L ^ (7g2) 
y 2EI y 3 6EI y 1 * 



26 As clear from their derivation, Eqs. (76)-(79) are valid for any distribution of parameters A, E, I, and p over the 
rod's length, provided that the rod is quasi-uniform, i.e. its parameters' changes are so slow that the local relation 
(78) is still valid at any point. 
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where the integration constant is selected to satisfy the clamping condition at the left end of the rod: q> y 
= 0 at z = 0. (Note that this is different from the support condition, illustrated on the lower panel of Fig. 
9b, which allows the angle at z = 0 to be finite but requires the torque to vanish.) Finally, integrating Eq. 
(79) with (p y given by Eq. (82), we get rod's global deformation law, 



*,(*) = • 



6EL 



(z-LY 



+ L z + const 



6EL 



(7.83) 



where the integration constant is zero again to satisfy the second left-boundary condition q = 0 atx = 0. 

One can see that the bending law (83) is sort of complex (polynomial, but not parabolic!) even in 
this very simple problem. It is also remarkable how fast does the end's displacement grow with the 
increase of rod's length: 

pgAL 4 



8EI, 



(7.84) 



To conclude the solution, let us discuss the validity of this result. First, the geometric relation 
(79) is only valid if \(p y (Z)| « 1, and hence if \qJ<V)\ « L. Next, the local formula Eq. (78) is valid if 

1 10 

1/R = T{L)IEIy « \la~A"\ Using results (81) and (84), we see that the latter condition is equivalent to 
\q x {L)\ « L 2 /a, i.e. is weaker, because all our analysis has been based on the assumption that L » a. 

Another point of concern may be that the off-diagonal stress component <j xz ~ FJA, that is 
created by the vertical gravity forces, has been ignored in our local analysis. For that approximation to 
be, this component must be much smaller than the diagonal component cr zz ~ aE/R = arll y taken into 
account in that analysis. Using Eqs. (80) and (81), we are getting the following estimates: <j xz ~ pgL, a zz 

2 3 2 4 

~ apgAL II y ~ a pgL II y . According to its definition (72), I y may be crudely estimated as a , so that we 
finally get the following simple condition: a « L, which has been assumed from the very beginning. 



7.6. Rod torsion 

One more class of analytically solvable elasticity problems is torsion of quasi-uniform, straight 
rods by a couple of axially-oriented torques x = n z r (Fig. 10). 
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If the deformation is elastic and small (in the sense kcl « 1 , where a is again the characteristic size of 
rod's cross-section), k is proportional to r, and their ratio, 



Torsional 
rigidity C 



k dcp z I dz 



(7.86) 



is called the torsional rigidity of the rod. Our task is to calculate the rigidity. 

As the first guess (as we will see below, of a limited validity), one may assume that the torsion 
does not change the shape or size of the cross-section, but leads just to the mutual rotation of cross- 
sections about certain central line. Using a reference frame with the origin on that line, this assumption 
immediately allows the calculation of components of the displacement vector dq, by using Eq. (6) with 
dq> = n z d(p z \ 

dq x - -yd(p, = -aydz, dq Y = xdcp, = nxdz, dq v = 0. (7.87) 
From here, we can calculate all Cartesian components (9) of the strain tensor: 



s.v, =s yy = s 2Z = 0, s xy = 0, s xz = s zx = --y, s yz = s zy = -x (7.88) 



The first of these equalities means that volume does not change, i.e. we are dealing with a pure shear 
deformation. As a result, all nonvanishing components of the stress tensor, calculated from Eqs. (34), 27 
are proportional to the shear modulus alone: 

°"« = °„ = °"= = 0 > °\,v = 0. °\vz = 0"„ = 'ft*?, = °zy = M™- ( 7 - 89 ) 

Now it is straightforward to use this result to calculate the full torque as an integral over the 
cross-section area^4: 

v z =\(rxdF) z =\ (xdF y - ydF x ) = \ (xa yz - ya xz )dxdy. (7.90) 

A A A 

Using Eq. (89), we get t z = juid z , i.e. 



Cfor axially 
symmetric 
rods 



C = jul_ , where I z = | (x 2 + y 2 )dxdy 



(7.91) 



Again, just as in the case of thin rod bending, we have got an integral similar to a moment of 
inertia, this time for rotation about axis z passing through a certain point of the cross-section. For any 
axially-symmetric cross-section, this evidently should be the central point. Then, for example, for the 
practically important case of a round pipe with internal radius Ri and external radius R2, Eq. (91) yields 



2 

C = ^2n\p s dp=-^{R\-R"). (7.92) 



In particular, for the solid rod of radius R this gives torsional rigidity C = (7r/2)juR 4 , while for a 
hollow pipe of small thickness t « R, Eq. (92) is reduced to 



27 For this problem, with purely shear deformation, using alternative elastic moduli E and cr would be rather 
unnatural. If needed, we may always use the second of Eqs. (50): ju = £72(1 + a). 
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C = 2n/jR 3 t 



(7.93) 



Note that per unit cross-section area A (and hence per unit mass) this rigidity is twice higher that of a 
solid rod: 



C 
A 



thin round pipe 



— /uR > | solid round rod — /jR 



(7.94) 



This fact is the basis of a broad use of thin pipes in construction. 

However, for rods with axially-asymmetric cross-sections, Eq. (91) gives wrong results. For 
example, for a narrow rectangle of area A = wt with t « w, it yields C = jutw 3 /\2 [WRONG!], even 
functionally different from the correct result - cf. Eq. (106) below. The reason of the failure of the above 
analysis is that does not describe possible bending q z of rod's cross-section in the direction along the 
rod. (For axially-symmetric rods, such bending is evidently forbidden by the symmetry, so that Eq. (91) 
is valid, and results (92)-(94) are absolutely correct.) Let us describe 28 this, rather counter-intuitive 
effect by taking 

q z =w(x,y), (7.95) 

(where y/ is some function to be determined), but still keeping Eq. (87) for two other components of the 
displacement vector. The addition of y/ does not change the fact that the diagonal components of the 
strain tensor, as well as s xy = s yx , are equal to zero, but contributes to other off-diagonal components: 



s„ =s„=- 



K 



y + 



dy/ 

dx 



K 

s = s = — 

yz zy 2 



x + 



dy/ 
dy 



(7.96) 



and hence to the corresponding elements of the stress tensor: 



CF, 



<7„ = ItK 



■y + 



dy/ 

dx 



a 



yz 



jUK 



X + 



dy/ 



(7.97) 



Now let us find the requirement imposed on function yAx,y) by the fact that the stress force 
component parallel to rod's axis, 



dF z =Vz X dA r+&zy dA y 



jumIA 



■y + 



dy/ 

dx 



dA. 



dA 



■ + 



x + 



dy/ 

dy 



dA, 



dA 



(7.98) 



has to vanish at rod's surface(s), i.e. at each border of its cross-section. Coordinates {x, y} of points at a 
border may be considered functions of the arc / of that line - see Fig. 11. As this figure shows, the 
elementary area ratios participating in Eq. (98) may be readily expressed via derivatives of functions x(P) 
and y{J)\ dAJdA = sin a = dyldl , dA y ldA = cos a = -dxldl, so that we may write 



y + 



dy/Y dy 



dx 



\dl j 



x + 



dy/ 

dy 



dx 
~dl 



= 0. 



(7.99) 



border 



Introducing, instead of yr, a new function %(x,y), defined by its derivatives as 



28 1 would not be terribly shocked if the reader skipped the balance of this section at the first reading. Though the 
following calculation is very elegant, its results will not be used in the rest of these notes. 
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dx L= U_ _dy_\ dz _ 1 
dx 2 y dy ) dy 2 



5^ 

3x 



(7.100) 



we may rewrite condition (99) as 

2 



dy dl dx dl 



/ border 



= 2^1 =0 

ji border u ' 

a/ 



(7.101) 



so that function % should be constant at each border of the cross-section. 




Fig. 7.11. Deriving Eq. (101). 



In particular, for a singly-connected cross-section, limited by just one continuous border line, the 
constant is arbitrary, because according to Eqs. (100), its choice does not affect the longitudinal 
deformation function yAx,y) and hence the deformation as the whole. Now let use the definition (100) of 
function % to calculate the 2D Laplace operator of this function: 



8^ + d^X_ 
d 2 x d 2 y 



]_d_ 

2 dx 



■x- 



dy/ 

dy 



]_8_ 
+ 2dy 



■y+ 



dy/ 

dy 



-1. 



(7.102) 



Cfor 
arbitrary 
cross- 
section 



This a 2D Poisson equation (frequently met, for example, in electrostatics), but with a very simple, 
constant right-hand part. Plugging Eqs. (100) into Eqs. (97), and those into Eq. (90), we may express 
torque r, and hence the torsional rigidity C, via the same function: 



(7.103a) 



Sometimes, it is easier to use this result in one of its two different forms. The first of them may 
be readily obtained from Eq. (103a) using integration by parts: 

C = -2//([ dyj xdx + J dx\ ydx) = -2// [f dy (x^ border - j zdx)+ j dx(yx horde! - J tffy 




= 4 M 



\xdxdy- x horAei \dxdy 



(7.103b) 



while the proof of one more form, 



C = 4 M \(v xy xfdxdy, 



(7.103c) 
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is left for reader's exercise. 

Thus, if we need to know rod's rigidity alone, it is sufficient to calculate function %(x,y) from Eq. 
(102) with boundary condition (101), and plug it into any of Eqs. (103). Only if we are also curious 
about the longitudinal deformation (95) of the cross-section, we may continue by using Eq. (100) to find 
function yAx,y). Let us see how does this general result work for the two examples discussed above. For 
the round cross-section of radius R, both the Poisson equation (102) and the boundary condition, % = 

2 2 2 

const at x +y = R , are evidently satisfied by the axially-symmetric function 

X = - — (x 2 + j 2 ) + const. (7.104) 
4 



For this case, either of Eqs. (103) yields 



C = 4 M \ 




dxdy = //J(x 2 +y 2 )d 2 r, (7.105) 



i.e. the same result (91) that we had for y/ = 0. Indeed, plugging Eq. (104) into Eqs. (100), we see that in 
this case dy/ldx = dy//dy = 0, so that yAx,y) = const, i.e. the cross-section is not bent. (As we have 
discussed in Sec. 1, a uniform translation dq z = xy/ = const does not give any deformation.) 

Now, turning to a rod with a narrow rectangular cross-section wt with t « w, we may use this 
strong inequality to solve the Poisson equation (102) approximately, neglecting the derivative along the 
wider dimension (say, y). The remaining ID differential equation d 2 %ld 2 x = -1, with boundary conditions 
tf\x = +n = x\x = -tn has an evident solution % = -x 12 + const. Plugging this expression into any form of Eq. 
(103), we get the correct result for the torsional rigidity: 

C = ^{xwt\ (7.106) 

Now let us have a look at the cross-section bending law (95) for this particular case. Using Eqs. (100), 
we get 

^ = -x-2^ = x, ^ = y + 2^ = y. (7.107) 

dy dx dx dy 

Integrating these differential equations over the cross-section, and taking the integration constant (again, 
not contributing to the deformation) for zero, we get a beautifully simple result: 

y/ = xy, \.Q.q z =Kxy. (7.108) 

It means that the longitudinal deformation of the rod has a "propeller bending" form: while the regions 
near the opposite corners (sitting on the same diagonal) of the cross-section bend toward one direction 
of axis z, corners on the other diagonal bend in the opposite direction. (This qualitative conclusion 
remains valid for rectangular cross-sections with any aspect ratio tlw.) 

For rods with several surfaces, i.e. with cross-sections limited by several boundaries (say, hollow 
pipes), the boundary conditions for function %(x, y) require a bit more care, and Eq. (103b) has to be 
modified, because the function may be equal to a different constant at each boundary. Let me leave the 
calculation of the torsional rigidity for this case for reader's exercise. 



Chapter 7 



Page 24 of 36 



Essential Graduate Physics 



CM: Classical Mechanics 



7.7. 3D acoustic waves 



Now moving to elastic dynamics, we may start with Eq. (24) that may be transformed into the 
vector form exactly as this was done for the static case in the beginning of Sec. 4. Comparing Eqs. (24) 
and (54), we immediately see that the result may be presented as 



3D plane, 
sinusoidal 
wave 



Elastic 
medium 
dynamics 
equation 



P 



a 2 q 

dt 2 



2(1 + a) 



V 2 q + 



2(1 + ct)(1 - 2a) 



V(V-q) + f(r,0. 



(7.109) 




Let us use this general equation for analysis of probably the most important type of time- 
dependent deformations: elastic waves. First, let us address the simplest case of a virtually infinite, 
uniform elastic medium, without any external forces f. In this case, due to the linearity and homogeneity 
of the resulting equation of motion, and in clear analogy with the ID case (see Sec. 5.3), we may look 
for a particular time-dependent solution in the form of a sinusoidal, linearly-polarized, plane wave 



(7.110) 



where a is the constant complex amplitude of a wave (now a vector!), and k is the wave vector whose 
magnitude is equal to the wave number k. The direction of these two vectors should be clearly 
distinguished: while a determined wave's polarization, i.e. the direction of the particle displacements, 
vector k is directed along the spatial gradient of the full phase of the wave 

¥ = k-r-a)t + arga, (7.111) 
i.e. along the direction of the wave front propagation. 

The importance of the angle between these two vectors may be readily seen from the following 
simple calculation. Let us point axis z of an (inertial) reference frame along the direction of vector k, 
and axis x in such direction that vector q, and hence a lie within the {x, z) plane. In this case, all 
variables may change only along that axis, i.e. V = n z (d/dz), while the amplitude vector may be 
presented as the sum of just two Cartesian components: 



a = a n +a n 



(7.112) 



Let us first consider a longitudinal wave, 29 with the particle motion along the wave direction: a x 
= 0, a z = a. Then vector q in Eq. (109), describing that wave, has only one (z) component, so that Vq = 

2 2 2 2 

dqjdz and V(V-q) = n z (d q/dz ), and the Laplace operator gives the same expression: V q = n z (d q/dz ). 



As a result, Eq. (109), with f 

d 2 q z 



P 



dt 2 



0, yields 
E 



2(1 + a) 2(l + o-)(l-2o-) 



d 2 q z 



d 2 



6z 2 (l + o-)(l-2o-) dz' 



(7.113) 



Plugging the plane -wave solution (110) into this equation, we see that it is indeed satisfied if the wave 
number and wave frequency are related as 



29 In geophysics, the longitudinal waves are known as P-waves (with letter P standing for "primary"), because 
due to their higher velocity (see below) they arrive at the detection site (from a distant earthquake or explosion) 
before waves of other types. 
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co = v t k, 



E(l-a) 



(l + a)(l-2a)p 



A: + (4/3)// 
P 



(7.114) 



Longitudinal 

waves: 

velocity 



This expression allows a simple interpretation. Let us consider a static experiment, similar to the 
tensile test experiment shown in Fig. 6, but with a sample much wider than L in both directions 
perpendicular to the force. Then the lateral contraction is impossible, and we can calculate the only 
finite stress component, cr zz , directly from Eq. (34) with Tr (s) = s zz : 



a _ 



= 2// 



1 



+ 3K 



1 > 
-> J 



\ 



V 



K + -u 
3 



(7.115) 



We see that the nominator in Eq. (114) is nothing more than the static elastic modulus for such a 
uniaxial deformation, and it is recalculated into the velocity exactly as the spring constant in the ID 
waves considered in Sec. 5.3 - cf. Eq. (5. 32). 30 Thus, the longitudinal acoustic waves are just simple 
waves of uniaxial extension/compression along the propagation axis. Formula (114) becomes especially 
simple in fluids, where ju = 0, and the wave velocity is described by well-known expression 



/ \l/2 



PJ 



(7.116) 



Longitudinal 
waves: 
velocity 
in fluids 



Note, however, that for gases, with their high compressibility and temperature sensitivity, the value of K 
participating in this formula may differ, at high frequencies, from that given by Eq. (42), because the 
fast compressions/extensions of gas are nearly adiabatic rather than isothermal. This difference is 
noticeable in Table 1 which, in particular, lists the values of v/ for some representative materials. 

Now let us consider an opposite case of transverse waves with a x = a, a z = 0. In such a wave, the 
displacement vector is perpendicular to z, so that Vq = 0, and the second term in the right-hand part of 
Eq. (109) vanishes. On the contrary, the Laplace operator acting on such vector still gives the same non- 

2 2 2 

zero contribution, V q = n z (d q/dz ), to Eq. (109), so that the equation yields 



P 



d 2 c 



dt 2 



2(1 + a) dz : 



and instead of Eq. (1 14) we now get 



co = v,k, 



2 

V, = ■ 



2(1 + a) p 



p 



(7.117) 



Transverse 
(7.118) waves: 
velocity 



We see that the speed of transverse waves depends exclusively from the shear modulus ju of the 
medium. 31 This is also very natural: in such waves, the particle displacements q = n x q are perpendicular 
to the elastic forces dF = n z dF, so that the only one component <j xz of the stress tensor is involved. Also, 



30 Actually, we can identify these results even qualitatively, if we consider a medium consisting of n parallel, 
independent ID chains per unit area. Extension of each chain fragment, of length d, by Ad « d gives force F = 
kAd, so that the total longitudinal stress, a z ~ = Fn, is related to strain s zz = Adld, as ajs zz = kn/d. Multiplying both 
parts of Eq. (5.33a) by n/d, and noticing that (mnld) is nothing more than the average mass density p, we make 
that equation absolutely similar to Eq. (113), just with a different notation for the longitudinal rigidity a zz ls zz . 

31 Because of that, one can frequently meet term shear waves. In geophysics, they are also known as S- waves, S 
standing for "secondary", again in the sense of arrival time. 
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the strain tensor sjj' has no diagonal components, Tr (s) = 0, so that ju is the only elastic modulus actively 
participating in the Hooke's law (34). 

In particular, fluids cannot carry transverse waves at all (formally, their velocity (118) vanishes), 
because they do not resist shear deformations. For all other materials, longitudinal waves are faster than 
the transverse ones. Indeed, for all known materials the Poisson's ratio is positive, so that the velocity 
ratio that follows from Eqs. (1 14) and (118), 



2 -2a 
l-2cr 



.1/2 



(7.119) 



is above V2 « 1.4. For the most popular construction materials, with a~ 0.3, the ratio is about 2 - see 
Table 1. 

Let me emphasize again that for both longitudinal and transverse waves the relation between the 
wave number and frequency is linear: a> = vk. As has already been discussed in Sec. 5.3, in this case of 
acoustic waves (or just "sound") there is no dispersion, i.e. a transverse or longitudinal wave of more 
complex form, consisting of several (or many) Fourier components of the type (110), preserves its form 
during propagation: 32 



q(z,t) = q(z-vt,0). 



(7.120) 



As one may infer from the analysis in Sec. 5.3, the dispersion would be back at very high (hypersound) 
frequencies where the wave number k becomes of the order of the reciprocal distance between the 
particles of the medium (e.g., atoms or molecules), and hence the approximation of the medium as a 
continuum, used through this chapter, became invalid. 

As we already know from Sec. 5.3, besides the velocity, an important parameter characterizing 
waves of each type is the wave impedance Z of the medium, for acoustic waves frequently called the 
acoustic impedance. Generalizing Eq. (5.44) to the 3D case, we may define the impedance as the ratio of 
the force per unit area (i.e. the corresponding component of the stress tensor) exerted by the wave, to 
particles' velocity. For example, for the longitudinal waves, propagating in the positive/negative 
direction along z axis, 



Z, = +- 



+- 



dq z I dt s zz dq z I dt 
Plugging in Eqs. (1 10), (1 14), and (115), we get 



_ cr„ dq v 1 8z 

= +— — — , 

s zz dq z /dt 



(7.121) 



Longitudinal 
waves: 
impedance 



Transverse 
waves: 
impedance 







1/2 


z,= 













(7.122) 



in a clear analogy with Eq. (5.45). Similarly, for the transverse wave, the appropriately modified 
definition, Z t = +<j xz /(dq x /dz), yields 

Z t ={wf 2 . (7.123) 



32 However, if the initial wave is an arbitrary mixture (109) of longitudinal and transverse components, these 
components, propagating with different velocities, will "run from each other". 
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Just like in the ID waves, one role of impedance is to scale the power carried by the wave. For 
plane 3D waves in infinite media, with their infinite wave front area, it is more appropriate to speak it is 
more appropriate to speak about power density, i.e. power ^ = drfdA per unit area of the front, and 
characterize it by not only its magnitude, 

but also the direction of the energy propagation, that (for a plane wave in an isotropic medium) 
coincides with the direction of the wave vector k: ^=^ k . Using definition (18) of the stress tensor, we 
may present the Cartesian components of this Umov vector^ as 

rr-T° r % C7.125) 

Returning to plane waves propagating along axis z, and acting exactly like in Sec. 5.3, for both the 
longitudinal and transverse waves we arrive at the following 3D analog Eq. (5.46), 

co Z * 



aa , (7.126) 
z 

with Z being the corresponding impedance - either Z/ or Z t . 

Just as in ID case, one more important effect in which the notion of impedance is crucial is wave 
reflection from at an interface between two media. The two boundary conditions, necessary for the 
analysis of these processes, may be obtained from the continuity of vectors q and dF. (The former 
condition is evident, while the latter one may be obtained by applying the 2 nd Newton law to the 
infinitesimal volume dV = dAdz, where segment dz straddles the boundary.) Let us start from the 
simplest case of the normal incidence on a plane interface between two media with constant and 
different elastic moduli and mass densities. Due to the symmetry, it is evident that the incident 
longitudinal/transversal wave may only excite longitudinal/transversal reflected and transferred waves, 
but not the counterpart wave type. Thus we can literally repeat all the calculations of Sec. 5.4, again 
arriving at the fundamental relations (5.53) and (5.54), with the only replacement of Z and Z' with the 
corresponding values of either Z/ (121) or Z t (123). Thus, at the normal incidence the wave reflection is 
determined solely by the acoustic impedances of the media, while the sound velocities are not involved. 

The situation, however, becomes more involved at a final angle of incidence (Fig. 12), where the 
transmitted wave is generally also refracted, i.e. propagates under a different angle, 0' ^ 0 i} to the 
interface. Angles 0 r and 0' may be readily found from the "kinematic" condition that the incident, 
reflected, and refracted waves should have the same spatial distribution along the interface plane, i.e. for 
the material particles participating in all three processes. From Eq. (110) we see that the necessary 
boundary condition is the equality of the tangential components (in Fig. 12, k x ), of all three wave 
vectors: 



33 Named after N. Umov who introduced this concept in 1874. Ten years later, a similar concept for 
electromagnetic waves (see, e.g., EM Sec. 6.4) was suggested by J. Poynting, so that some textbooks use the term 
"Umov-Poynting vector". In a dissipation-free, elastic medium, the Umov vector obeys the following continuity 

equation, d{pv 2 12 + u)l dt + V • p = 0, with u given by Eq. (52), which expresses the conservation of the total 

(kinetic plus potential) energy of elastic deformation. 
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k x = k t sin 0. = k r sin 0 r = k'sin & . (7. 127) 

Since in an isotropic media, £, = k r = k, and k'/k = (co/v y(co/v) = v/v ', we immediately get relations 

Reflection 
and 
refraction 
angles 

which are general for dispersion-free 3D waves of any nature. (In optics, the latter relation is known as 
the Snell law. 34 ) This means that, just like in optics, the direction of wave propagating in a medium with 
lower velocity is closer to the normal (axis z); for example, Fig. 12 shows a qualitatively correct picture 
of refraction if v ' < v. 35 



fz 




Fig. 7.12. "Kinematic" condition of wave 
reflection and refraction. 



However, the refraction/reflection of acoustic waves is more complex than that of optical (or any 
electromagnetic) waves, which may be only transverse: even if the incident acoustic wave is purely 
longitudinal or transverse, generally it excites both longitudinal and transverse, reflected and refracted 
waves. Indeed, at 6? ^ 0, the direction of particle motion (vector q) in the incident wave is neither 
exactly parallel nor exactly perpendicular to the interface, and thus serves as an actuator for waves of 
both types. This is the reason why expressions for amplitudes of the reflected and refracted waves via 
the amplitude of the incident wave are much more bulky then those in the electromagnetic wave theory 
(where they are called the Fresnel formulas 36 ), and though they are straightforward to derive (again, 
from the continuity of vectors q and dF), I do not have time/space for spelling them up. Let me only 
note that, in contrast to the case of normal incidence, these relations involve the values of both the 
impedances Z, Z', and the velocities v, v' of media on both sides of the interface, for both the 
longitudinal and transverse waves. 

There is another factor that makes boundary acoustic effects more complex. Within a certain 
frequency range, interfaces (and in particular surfaces) of elastic solids may sustain the so-called surface 
acoustic waves (the term is frequently abbreviated as SAW), in particular, the Rayleigh waves and Love 
waves? 1 The main feature that distinguishes such waves from their bulk (longitudinal and transverse) 



34 Named after W. Snellius (1580-1626) who rediscovered the fact that had been described as early as in 984 by 
Abu Saad al-Ala ibn Sahl. 

35 In particular, this means that if v ' > v, acoustic waves, at larger angles of incidence, may exhibit the effect of 
total internal reflection, so well known from optics - see, e.g., EM Sec. 7.5. 

36 Their discussion may be also found in EM Sec. 7.5. 

37 Named, respectively, after Lord Rayleigh (born J. Strutt, 1842-1919) who has theoretically predicted the very 
existence of surface acoustic waves, and A. Love (1863-1940). 



0=0=0, 



sin 0' sin 0 



(7.128) 
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counterparts is that the particle displacement amplitude is maximal at the interface and decays 
exponentially into the bulk of both adjacent media. The characteristic depth of this penetration is of the 
order of (though not exactly equal to) the wavelength. 

In the Rayleigh waves, the particle displacement vector q has two components: one longitudinal 
(and hence parallel to the interface along which the wave propagates) and another transverse 
(perpendicular to the interface). In contrast to the bulk waves discussed above, the components are 
coupled (via their interaction with the interface) and as a result propagate with a single velocity vr. As a 
result, the trajectory of each particle in the Rayleigh wave is an ellipse in the plane perpendicular to the 
interface. A straightforward analysis 38 of the Rayleigh waves on the surface of an elastic solid (i.e. its 
interface with vacuum) yields the following equation for v R : 



2 

V < J 



= 16 



1- 



2 



t J 



1 



V 



2 

V l J 



(7.129) 



According to this formula, for realistic materials with 0 < a < Vi, the Rayleigh waves are slightly (by 4 
to 13%) slower than the bulk transverse waves - and hence are substantially slower than the bulk 
longitudinal waves. 

In contrast, the Love waves are purely transverse, with vector q oriented parallel to the interface. 
However, interaction of the waves with the interface reduces their velocity v L in comparison with that 
(v ? ) of the bulk transverse waves, keeping it in the narrow interval between v t and vr: 



v r < v l < v t < v r 



(7.130) 



The practical importance of surface acoustic waves is that their amplitude decays very slowly 

1/2 

with distance r from their point-like source: a oc Vr, while that of any bulk waves decays much faster, 
as a oc 1/r. (Indeed, in the latter case the power emitted by the source is distributed over a semi-sphere 
whose surface area is proportional to r 2 , while in the former case all the power goes into a thin surface 
layer of circular form, whose front area scales as r.) At least two areas of applications of the surface 
acoustic waves have to be mentioned: in geophysics (for earthquake detection and Earth crust 
seismology), and electronics (for signal processing with the focus on frequency filtering). Unfortunately, 
I cannot dwell on these (very interesting) topics and I have to refer the reader to special literature. 39 



7.8. Elastic waves in restricted geometries 

From what we have discussed in the end of the last section, it should be pretty clear that 
generally the propagation of acoustic waves in elastic bodies of finite size may be very complicated. 
There is, however, one important limit in which several important results may be readily obtained. This 
is the limit of (relatively) low frequencies, where the wavelength is much larger than at least one 
dimension of a system. Let us consider, for example, various waves that may propagate along thin rods, 
in this case "thin" meaning that the characteristic size a of rod's cross-section is much smaller than not 
only the length of the rod, but also the wavelength A = 2nlk. In this case there is a considerable range of 
distances z along the rod, 



38 See, e.g., Sec. 24 in L. Landau and E. Lifshitz, Theory of Elasticity , 3 r ed., Butterworth-Heinemann, 1986. 

39 See, for example, K. Aki and P. G. Richards, Quantitative Seismology, 2 nd ed., University Science Books, 2002, 
and D. Morgan, Surface Acoustic Waves, 2 nd ed., Academic Press, 2007. 
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a«Az«A, (7.131) 

in which we can neglect the dynamic effects due to medium inertia, and apply results of our earlier static 
analyses. 

For example, for a longitudinal wave of stress, which is essentially a wave of periodic tensile 
extensions and compressions of the rod, within range (131) we can use the static relation (44): 

(7-132) 

For what follows, it is easier to use the general equation of elastic dynamics not in its vector form (109), 
but rather in the precursor, Cartesian-component form (25), with Jj—0. For plane waves propagating 
along axis z, only one component (with j ' — > z) of the sum in the right-hand part of this equation is non- 
vanishing, and it is reduced to 

d 2 q i d<j iz 

(7 - 133) 

In our current case of longitudinal waves, all components of the stress tensor but o zz are equal to zero. 
With a zz from Eq. (132), and using the definition s zz = dqjdz = dqjdz, Eq. (133) is reduced to a very 
simple wave equation, 

P^f = E%, (7.134) 

d 2 t 2 dz 2 

which shows that the velocity of such tensile waves is 

y/2 

(7.135) 



Tensile 






1/2 


waves: 


V = 






velocity 









Comparing this result with Eq. (114), we see that the tensile wave velocity, for any medium with a > 0, 
is lower than the velocity v/ of longitudinal waves in the bulk of the same material. The reason for this 
difference in simple: in thin rods, the cross-section is free to oscillate (e.g., shrink in the longitudinal 
extension phase of the passing wave), 40 so that the effective force resisting the longitudinal deformation 
is smaller than in a border-free space. Since (as clearly visible from the wave equation), the scale of the 
force gives the scale of v 2 , this difference translates into slower waves in rods. Of course as wave 
frequency is increased, at ka ~ 1 there is a (rather complex and cross-section-depending) crossover from 
Eq. (135)toEq. (114). 

Proceeding to transverse waves in rods, let us first have a look at long bending waves, with 
vector q = n x q x (with axis x along the bending direction - see Fig. 8) being approximately constant in the 
whole cross-section. In this case, the only component of the stress tensor contributing to the net 
transverse force F x is a xz , so that the integral of Eq. (133) over the cross-section is 

pA d ^= 8 ^, F x =\a xz dA. (7.136) 
ot dz \ 



40 Due to this reason, the tensile waves can be called longitudinal only in a limited sense: while the stress wave is 
purely longitudinal <j xx = <j yy = 0, the strain wave is not: s xx = s yv = -os zz ^ 0, i.e. q(r, t) ^ n z q z ,. 
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Now, if Eq. (131) is satisfied, we again may use static local relations (77)-(79), with all derivatives dldz 
duly replaced with their partial form dldz, to express force F x via the bending deformation q x . Plugging 
these relations into each other one by one, we arrive at a very unusual differential equation 

pA% = ~EI y %. (7.137) 

dt 1 y dz A 



Looking for its solution in the form of a sinusoidal wave (110), we get a nonlinear dispersion relation: 41 

EL 



( T7J V' 2 



CO = 



' y 



\P A J 



k\ (7.138) 



Such relation means that the bending waves are not acoustic at any frequency, and cannot be 
characterized by a single velocity that would be valid for all wave numbers k, i.e. for all spatial Fourier 
components of a waveform. According to our discussion in Sec. 5.3, such strongly dispersive systems 
cannot pass non-sinusoidal waveforms too far without changing their waveform very considerably. 

This situation changes, however, if the rod has an initial uniform longitudinal stress cr zz = T/A 
(where force T is usually called tension), on whose background the transverse waves propagate. To 
analyze its effect, let us redraw Fig. 6, for a minute neglecting the bending stress - see Fig. 13. 



T(z + dz) 




m{z + dz) 



z + dz 



Fig. 7.13. Additional forces in a thin rod 
("string"), due to background tension T. 



Still sticking to the limit of small angles (p, the additional vertical component dF x of the net force 
acting on a small rod fragment of length dz is T x (z - dz) - T x (z) = T<p y (z + dz) - T(p y (z) « T(d<p y /dz)dz, so 
that dFJdz = T(d(p y /dz). With the geometric relation (79) in its partial-derivative form dqjdz = q> y , this 
additional term becomes T(d 2 q x /dz ). Adding it to the right-hand part of into Eq. (137), we get the 
following dispersion relation 

(7.139) 

At low k (and hence low frequencies), it describes acoustic waves with the "guitar string" velocity that 
should be well known to the reader from undergraduate courses: 




Bending 
waves: 
dispersion 
relation 




Waves 
(7.140) on a string: 
velocity 



41 Note that since the "moment of inertia" I y , defined by Eq. (72), may depend on the bending direction (unless 
the cross-section is sufficiently symmetric), the dispersion relation (138) may give different results for different 
directions of the bending wave polarization. 
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where the denominator is nothing else than the linear mass density. However, as the frequency grows, 
Eq. (139) describes a crossover to highly-dispersive bending waves (138). 

Now let us consider the so-called torsional waves that are essentially the dynamic propagation of 
the torsional deformation discussed in Sec. 6. The easiest way to describe these waves, again within the 
limits given by Eq. (131), is to write the equation of rotation of a small segment dz of the rod about axis 
z, passing through the "center of mass" of its cross-section, under the difference of torques x = n z r 
applied on its ends - see Fig. 10: 



, d 2 <p 
pl.dz — = dr. 

* dt 2 



(7.141) 



where I z is the "moment of inertia" defined by Eq. (91), which now, after its multiplication by pdz, i.e. 
by the mass per unit area, has turned into the real moment of inertia of a t/z-thick slice of the rod. 
Dividing both parts by dz, using the static local relation (86), z z = Ck= C(dcpjdz), we get the following 
differential equation 



dt' 



dz 2 



(7.142) 



Just as Eqs. (1 14), (118), (135) and (140), this equation describes an acoustic (dispersion-free) wave that 
propagates with frequency-independent velocity 
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(7.143) 



As we have seen in Sec. 6, for rods with axially-symmetric cross-sections, the torsional rigidity C is 
described by the simple equation (91), C = /ul z , so that expression (143) is reduced to Eq. (118) for the 
transverse waves in infinite media. The reason for this similarity is simple: in a torsional wave, particles 
oscillate along small arcs (Fig. 14a), so that if the rod's cross-section is round, the stress-free surface 
does not perturb or modify the motion in any way, and hence does not affect the transverse velocity. 




o 


o 


o 






o 



(b) 



Fig. 7.14. Particle trajectories in two 
different transverse waves with the same 
velocity: (a) torsional waves in a thin 
round rod and (b) circularly-polarized 
waves in an infinite (or very broad) 
sample. 



This fact raises an interesting issue of the relation between the torsional and circularly-polarized 
waves. Indeed, in Sec. 7, I have not emphasized enough that Eq. (118) is valid for a transverse wave 
polarized in any direction perpendicular to vector k (in our notation, directed along axis z). In particular, 
this means that such waves are doubly-degenerate: any isotropic elastic medium can carry 
simultaneously two non-interacting transverse waves propagating in the same direction with the same 
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velocity (118), with mutually perpendicular linear polarizations (directions of vector a), for example, 
directed along axes x and y. If both waves are sinusoidal (110), with the same frequency, each point of 
the medium participates in two simultaneous sinusoidal motions within the [x,y] plane: 



<7, 



= Re 



a e 



i(kz - cot) 



= A cos*?, 



q y = Re 



a y e 



i(kz - cot) 



= A cos^ + p), (7.144) 



where *P = kz - cot + cp x , and <p= cpy - cp x . Trigonometry tells us that the trajectory of such motion on the 
[x, y] plane is an ellipse (Fig. 15), so that such waves are called elliptically -polarized. The most 
important particular cases of such polarization are: 



0)P 
axis x: and 



0 or TV. a linearly-polarized wave, with vector a turned by angle 6 = Arctan (A y /A x ) from 



(ii) cp = ± nil and A x = A y : circularly polarized waves, with the right or left polarization, 
respectively. 

The circularly polarized waves play an important role in quantum mechanics, where such waves 
may be most naturally quantized, with elementary excitations (in the case of mechanical waves we are 
discussing, called phonons) having either positive or negative angular momentum L z = ±h. 

Now comparing the trajectories of particles in the torsional wave in a thin round rod (or pipe) 
and the circularly-polarized wave in a broad sample (Fig. 14), we see that, despite the same wave 
propagation velocity, these transverse waves are rather different. In the former case (Fig. 14a) each 
particle moves back and forth along an arc, with the arc length different for different particles (and 
vanishing at rod's center). On the other hand, in a circularly-polarized, plane wave all particles move 
along similar, circular trajectories. 




Fig. 7.15. Trajectory of a particle of an infinite 
medium with elliptically-polarized transverse 
wave, within the plane perpendicular to the 
direction of wave propagation. 



In conclusion, let me briefly mention the opposite limit, when the size of the body, from whose 
boundary are completely reflected, 42 is much larger than the wavelength. In this case, the waves 
propagate almost as in an infinite 3D medium (Sec. 7), and the most important new effect is the finite 
numbers of wave modes in the body. Repeating ID analysis of Sec. 5.4 for each dimension of a 3D 
cuboid of volume V = L\L 2 Lt, (for example, using the Born-Karman boundary conditions in each 



42 For acoustic waves, such condition is easy to implement. Indeed, from Sec. 7 we already know that the strong 
inequality of wave impedances Z is sufficient for such reflection. The numbers of Table 1 show that, for example, 
the impedance of a longitudinal wave in a typical metal (say, steel) is almost two orders of magnitude higher than 
that in air, ensuring their virtually full reflection from the surface. 
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dimension), we obtain Eq. (5.59) for the spectrum of components of wave vector k along each side. This 
means that all possible wave vectors are located in nodes of a rectangular 3D mesh with steps 2n/Lj in 
each direction, and hence with the £-space ("reciprocal space") volume 



Reciprocal 
volume per 
wave vector 



v k = 



2k 2k 2k (2k) 3 



L\ L 2 L i 



V 



(7.145) 



per each vector. It is possible (though not quite as straightforward as it is sometimes assumed) to prove 
that this relation is valid regardless of the shape of volume V. Hence the number of different wave 
vectors within the reciprocal space volume d 3 k » Vk is 



dN = 



d k 

~V7 



V 



{2k) 



-d 3 k»\. 



(7.146a) 



3D 
density 
of states 



In quantum mechanics, this relation takes the form of the density of quantum states in A>space: 

(7.146b) 



S k =S 



dN _ gV 



d 3 k (2k) 3 



where g is the number of possible different quantum states with the same de Broglie wave vector k. In 
this form, Eq. (146) is ubiquitous in physics. 43 For phonons, formed from quantization of one 
longitudinal mode, and two transverse modes with different polarizations, g = 3. 



7.9. Exercise problems 

7.1 . A uniform thin sheet of an isotropic, elastic material is 
compressed, along its thickness t, by two plane, parallel, broad (of area 
A » r) rigid surfaces - see Fig. on the right. Assuming no slippage 
between the sheet and the surfaces, calculate the relative compression 
(-At/t) as a function of the compressing force. Compare the result with 
that for the tensile stress, given by Eq. (47). 



7.2 . A thin, wide sheet of an isotropic, elastic material is clamped in two rigid, plane, parallel 
surfaces that are pulled apart with force F. Find the relative extension AL/L of the sheet in the direction 
of the force, and its relative compression At/t in the perpendicular direction, and compare the results 
with Eqs. (47)-(48) for the tensile stress, and the solution of Problem 1. 



7.3 . Calculate the radial extension AR of a thin, long, round 
cylindrical pipe under the effect of its rotation with a constant angular 
velocity co about its symmetry axis (see Fig. on the right), in terms of the 
elastic moduli E and cr, assuming that pressure both inside and outside the 
pipe is negligible. 




t«R 



43 See, e.g., EM Sees. 7.7 and 7.9, and QM Sec. 1.5. 
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7.4 . A long, uniform rail with the cross-section shown in Fig. on 
the right, is being bent with the same (small) torque twice: first within 
plane xz and then within plane yz. Assuming that t « L, find the ratio 
of rail deformations in these two cases. 



7.5. Calculate the law q x (z) of weak bending of a thin, heavy, 
elastic rod supported at both ends (Fig. on the right), by its own 
weight. Compare the maximum rod's deflection q max with that 
calculated in Sec. 7.5 of the lecture notes for a similar rod with one 
end clamped in the wall. 




| i 



7.6 . Calculate the spring constant dFldL of a coil 
spring made of a uniform, elastic wire, with circular cross- 
section of diameter d, wound as a dense round spiral of N 
» 1 turns of diameter D » d - see Fig. on the right. 
Comment on the type of material's deformation. 




7.7 . Use Eqs. (101) and (102) to recast Eq. (103b) for the torsional rigidity C into the form given 
by Eq. (103c). 



7.8 . Generalize Eq. (103b) to the case of rods with more that one cross-section boundary. Use the 
result to calculate the torsional rigidity of a thin round pipe, and compare it with Eq. (93). 



7.9 . A steel wire with the circular cross-section of a 3-mm diameter is stretched with a constant 
force of 10 N and excited at frequency 1 kHz by an actuator that excites all modes of longitudinal and 
transverse waves. Which wave has the highest group velocity? Accept the following parameters for steel 
(see Table 7.1): E= 170 GPa, a= 0.30, p = 7.8 g/cm 3 . 



7.10 . Define and calculate appropriate wave impedances for (i) tensile 
and (ii) torsional waves in a thin rod. Use the results to calculate what fraction 
of each wave's power is reflected from the connection of a long rod with 
round cross-section to a similar rod, but with twice larger diameter - see Fig. 
on the right. 



-» 
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Chapter 8. Fluid Dynamics 



This chapter describes the basic notions of fluid dynamics, discusses solutions of a few simple problems 
of dynamics of ideal and viscous fluids, and gives a very brief review of more complex phenomena such 
as turbulence. On the margins, I will discuss numerical methods of solving partial differential equations 
- whose importance extends well beyond the fluid dynamics. 



The mechanics of fluids (the class of materials that includes both liquids and gases) is both more 
simple and more complex than that of the elastic solids, with the simplicity falling squarely to the 
domain of statics - often called hydrostatics, because water has always been the main fluid for the 
human race and hence for science and engineering. Indeed, fluids are, by definition, the media that 
cannot resist static shear deformations. There are two ways to express this fact. First, we can formally 
take the shear modulus ju, describing this resistance, to be equal zero. Then the Hooke's law (7.34) 
shows that the stress tensor is diagonal: 



Alternatively, the same conclusion may be reached by looking at the stress tensor definition (7.19) and 
saying that in the absence of shear stress, the elementary interface dF has to be perpendicular to the area 
element dA, i.e. parallel to vector dA. 

Moreover, in fluids at equilibrium, all three diagonal components ojj of the stress tensor have to 
be equal. To prove that, it is sufficient to single out (mentally rather than physically) from a fluid a small 
volume in the shape of a right prism, with mutually perpendicular faces normal to the two directions we 
are interested in (Fig. 1, along axes x andj). 



The prism is in equilibrium if each Cartesian component of the total force acting on all its faces 
nets to zero. For the x-component this balance is o xx dA x - (a aa dA)cosa = 0. However, from the geometry 
(Fig. 1), dA x = dAcosa, and the above balance condition yields <j aa = cr xv . A similar argument for the 
vertical forces gives <j aa = Oyy, so that a xx = a }y .Since such equality holds for any pair of diagonal 
components of the stress tensor, ojj, all three of them have to be equal. This common component is 
usually represented as (-P), because in the vast majority of cases, parameter P, called pressure, is 
positive. Thus we arrive at the key relation (which has already been mentioned in Ch. 7): 



8.1. Hydrostatics 




Fig. 8.1. Proving the pressure isotropy. 



Pressure 



(8.2) 
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In the absence of bulk forces, pressure should be constant through the volume of fluid, due to 
symmetry. Let us see how this result is affected by bulk forces. With the simple stress tensor (2), the 
general condition of equilibrium of a continuous medium, expressed by Eq. (7.25) with zero left-hand 
part, becomes just 

dP 

- — + fj=0, (8.3) 

or j 

and may be re-written in a convenient vector form: 

-VP + f = 0. (8.4) 

In the simplest case of a heavy fluid, with mass density p, in a uniform gravity field, f = pg, and the 
equation of equilibrium becomes, 

-VP + pg = 0, (8.5) 

with only one nonvanishing component (vertical, near the Earth surface). If, in addition, the fluid may 
be considered incompressible, with its density p constant, 1 this equation may be readily integrated to 
give the so-called Pascal equation: 2 



P + pgy = const, 



(8.6) 



Pascal 
equation 



where y is the vertical coordinate, with the direction opposite to that of vector g. 

This equation, and its application examples, should be well familiar to the reader from his or her 
undergraduate physics courses. Note, however, that the integration of Eq. (4) may be more complex in 
the case if the bulk forces f depend on position, 3 and/or if the fluid is substantially compressible. In the 
latter case, Eq. (4) should be solved together with the media-specific equation of state p = p(P) 
describing the compressibility law - whose example is given by Eq. (7.40) for ideal gases: p = mNIV = 
mPlkvT, where m is the mass of one gas molecule. 



8.2. Surface tension effects 



Besides the bulk (volume-distributed) forces, one more possible source of pressure is surface 
tension. This effect results from the fact that the potential energy of atomic interactions on the interface 
between two different fluids is different from that in their bulks. This effect may be described by an 
additional potential energy 



u,= r A, 



(8.7) 



Surface 
tension 
descriDtion 



1 As was discussed in Sec. 7.3 in the context of Table 7.1, this is an excellent approximation, for example, for 
human-scale experiments with water. 

2 The equation, and the SI unit of pressure 1 Pa = lN/m 2 , are named after B. Pascal (1623-1662) who has not only 
pioneered hydrostatics, but also invented the first mechanical calculator and made several other important 
contributions to mathematics - and Christian philosophy! 

3 An example of such a problem is given by fluid equilibrium in coordinate systems rotating with constant angular 
velocity. Here the real bulk forces should be complemented by the centrifugal "force" - the only inertial force 
which does not vanish at constant © and r - see Eq. (6.92). 
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where A is the interface area, and y is called the surface tension constant (or just the "surface tension"), 
evidently of the dimensionality of J/m , i.e. N/m. For a stable interface of any two fluids, y is always 
positive. 4 In the absence of other forces, the surface tension makes a liquid drop spherical to minimize 
its surface area at fixed volume. 

For the analysis of the surface tension effects in the presence of other forces, it is convenient to 
reduce it to a certain additional effective pressure drop AP e f at the interface. In order to calculate AP e f, 
let us consider the condition of equilibrium of a small part dA of a smooth interface between two fluids 
(Fig. 2), in the absence of bulk forces. 



\ P 2 




Fig. 8.2. Deriving the Young-Laplace 
formula (10). 



If pressures on two sides of the interface are different, the work of stress forces on fluid 1 at 
a small virtual displacement Sr = n 5r of the interface (where n = dXIdA in the unit vector normal to the 
interface) is 5 

5W = dASr{P l -P 2 ). (8.8) 

For equilibrium, this work has to be compensated by an equal change of the interface energy, 5Ui = 
yd\dA). Differential geometry tells us that in the linear approximation in Sr, the relative change of the 
elementary surface area, corresponding to a fixed solid angle dD., may be expressed as 

s(dA) = * + a L 

dA R, R 2 

where R\ t 2 are the so-called principal radii of the interface curvature. 6 Combining Eqs. (7)-(9), we get 
the Young-Laplace formula 7 



4 If y of the interface of certain two fluids is negative, it self-reconfigures to decrease U s by the interface area, i.e. 
fragments the system into a solution. 

5 This formula follows from the general Eq. (7.32), with the stress tensor elements expressed by Eq. (2), but in 
this simple case of the net stress force dF = (P x - P 2 )dA parallel to the interface element vector dA, it may be 
more readily obtained just from the definition of work 5W= d$-dr at the virtual displacement Sc = nSr. 

6 This general formula may be verified by elementary means for a sphere of radius r (for which R\ = R 2 = r and dA 
= r 2 dQ, so that 6\dA)ldA = d\r 2 )/r 2 = ISrlr), and a round cylindrical interface of radius R (for which Ri = r,R 2 = 
oo, and dA = rdcpdz, so that 6\dA)ldA = Srlr). 

1 This formula (not to be confused with Eq. (12), called the Young's equation) was derived in 1806 by P.-S. 
Laplace (of the Laplace operator/equation fame) on the basis of the first analysis of the surface tension effects by 
T. Young a year earlier. 
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(8.10) 



In particular, this formula shows that the additional pressure created by surface tension inside a 
spherical drop of a liquid, of radius R, equals 2y/R, i.e. decreases with R. In contrast, according to Eqs. 
(5)-(6), the effects of bulk forces, for example gravity, grow as pgR. The comparison of these two 
pressure components shows that if the drop radius (or more generally, the characteristic linear size of a 
fluid sample) is much larger than the so-called capillary length 



Young- 
Laplace 
formula 



a c = 


2y 


1/2 






kPSj 





(8.11) 



Capillary 
length 



the surface tension may be safely ignored - as will be done in the following sections of this chapter, 
besides a brief discussion of Eq. (48). For the water surface, or more exactly its interface with air at 
ambient conditions, y~ 0.073 N/m, while p ~ 1,000 kg/m 3 , so that a c ~ 4 mm. 

On the other hand, in very narrow tubes, such as blood capillary vessels with radius a ~ 1 |um, 
i.e. a « a c , the surface tension effects are very important. The key notion for the analysis of these 
effects is the equilibrium contact angle 0 C (also called the "wetting angle") at the edge of a liquid 
wetting a solid - see Fig. 3. 




Fig. 8.3. Contact angles 
for (a) hydrophilic and 
(b) hydrophobic surfaces. 



According to its definition (7), constant y may be interpreted as a force (per unit length of the 
interface boundary) directed along the interface and trying to reduce its area. As a result, the balance of 
horizontal components of the three such forces, shown in Fig. 3, immediately yields 

Y* +r ig cos0 c =y t 

where the indices at constants y correspond to three possible interfaces between the liquid, solid and gas. 
For the so-called hydrophilic surfaces that "like to be wet" by this particular liquid (not necessarily 
water), meaning that y % \ < y sg , this relation yields cos6* c > 0, i.e. 6 C < kI2 - the situation shown in Fig. 3a. 
On the other hand, for hydrophobic surfaces with y s \ > y sg , Young's equation (12) yields larger contact 
angles, 6 C > nil - see Fig. 3b. 

Let us use this notion to solve the simplest but perhaps the most important problem of this field - 
find the height h of the fluid column in a narrow vertical tube made of a hydrophilic material, lifted by 
the surface tension forces, assuming its internal surface to be a round cylinder of radius a - see Fig. 4. 
Inside an incompressible fluid, pressure drops with height according to the Pascal equation (6), so that 
just below the surface, P « Po - pgh, where Po is the background (e.g., atmospheric) pressure. This 



f o i o\ Youn 9' s 
yo.i^j equation 
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means that at a « h the pressure variation along the concave surface (called the meniscus) of the liquid 
is negligible, so that according to the Young-Poisson equation (10) the sum {\IR\ + HR2) has to be 
virtually constant along the surface. Due to the axial symmetry of the problem, this means that the 
surface has to be a part of a sphere. 8 From the contact angle definition, radius R of the sphere is equal to 
alcosOc - see Fig. 4. 




Fig. 8.4. Liquid rise in a vertical capillary tube. 



Plugging this relation into Eq. (10) with Pi -P 2 = pgh, we get the following equation for h: 



pgh 



2y cos 0 C 



(8.13a) 



In hindsight, this result might be obtained more directly - by requiring the total weight pgV= pg{m h) 
of the lifted liquid's column to be equal to the vertical component Fcosft of the full surface tension 
force F = yp acting on the perimeter p = 2na of the meniscus. Using the definition (11) of the capillary 
length a c , Eq. (13a) may be presented as the so-called Jurin rule: 



Jurin 
rule 



h = ^COS0<^~ 



a 



a 



(8.13b) 



according to our initial assumption h » a, Eq. (13) is only valid for narrow tubes, with radius a « a c . 
This capillary rise is the basic mechanism of lifting water with nutrients from roots to the branches and 
leaves of plants, so that the tallest tree height is practically established by the Jurin rule (13), with cos6* c 
« 1 and the pore radius a limited from below by a few microns, because of the viscosity effects 
restricting the fluid discharge - see Sec. 5 below and in particular the Poiseuille formula (60). 



8.3. Kinematics 

In contrast to the stress tensor, which is useful and simple - see Eq. (2), the strain tensor is not a 
very useful notion in fluid mechanics. Indeed, besides a very few situations, 9 typical problems of this 
field involve fluid flow, i.e. a state when velocity of fluid particles has some nonzero time average. This 



8 Note that this is not true for tubes with different shapes of their cross-section. 

9 One of them is the sound propagation, where particle displacements q are typically small, so that results of Sec. 
7.7 are applicable. As a reminder, they show that in fluids, with // = 0, the transversal sound cannot propagate 
(formally, has zero velocity and impedance), while the longitudinal sound's velocity is finite - see Eq. (7.1 16). 
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means that the trajectory of each individual particle is a long line, and the notion of its displacement q 
becomes impracticable. However, particle's velocity v = dqldt is a much more useful notion, especially 
if it is considered as a function the observation point r and (generally) time t. In an important class of 
fluid dynamics problem, the so-called stationary (or "steady ", or "static") flow, the velocity defined in 
this way does not depend on time, v = v(r). 

There is, however, a price to pay for the convenience of this notion: namely, due to the 

2 2 

difference between vectors q and r, particle's acceleration a = d q/dt (that participates, in particular, in 
the 2 nd Newton law) cannot be calculated just as a time derivative of velocity v(r, i). This fact is evident, 
for example, for the static flow case, in which the acceleration of individual fluid particles may be very 
significant even if v(r) does not depend on time - just think about the acceleration of a drop of water 
flowing over the Niagara Falls rim, first accelerating fast and then virtually stopping below, while the 
water velocity v at every particular point, as measured from a bank-based reference frame, is nearly 
constant. Thus the main task of fluid kinematics is to express a via v(r,t); let us do this. 

Since each Cartesian component Vj of the velocity has to be considered as a function of four 
independent scalar variables, three Cartesian components ry of vector r and time t, its full time 
derivative may be presented as 



dv , dv , ^ q v d r 



dt dt 



■ + 



I 



\ dr., dt 



(8.14) 



Let us apply this general relation to a specific set of infinitesimal changes {dr\, dri, dr^} that follows a 
small displacement dq of a certain particular particle of the fluid, dr = dq = vdt, i.e. 



dr, 



Vjdt . 



(8.15) 



In this case dv/dt is the y'-th component a, of the particle's acceleration a, so that Eq. (14) yields the 
following key relation of fluid kinematics: 



• 10 



(8.16a) 




Using operator V, this result may be rewritten in the following compact vector form 



a 



d\ 

— + (v-V)v 

dt 



(8.16b) 



This relation already signals the main technical problem of the fluid dynamics: many equations 
involving particle's acceleration are nonlinear in velocity, excluding such a powerful tool the linear 
superposition principle from the applicable mathematical arsenal. 

One more basic relation of the fluid kinematics is the so-called continuity equation, which is 
essentially just the differential version of the mass conservation law. Let us mark, inside a fluid flow, an 



Fluid 

particle's 

acceleration 



10 The operator relation dldt = dldt + (v-V), applicable to an arbitrary (scalar or vector) function, is frequently 
called the convective derivative. (Alternative adjectives, such as "Lagrangian", "substantial", or "Stokes", are 
sometimes used for this derivative as well.) The relation has numerous applications well beyond the fluid 
dynamics - see, e.g., EM Sec. 9.3. 
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arbitrary volume V limited by stationary (time-independent) surface S. The total mass of the fluid inside 
the volume may change only due to its flow though the boundary: 

= —\pd l r = -j" pv n d 2 r = -^p\-dA, (8.17a) 



where the elementary area vector dA is defined just as in Sec. 7.2 - see Fig. 5. 




Using the same the same divergence theorem that has been used several times in this course, 11 
the surface integral in Eq. (17a) may be transformed into the integral of V(/?v) over volume V, so that 
this relation may be rewritten as 



dp 
~8t 



+ V j 



d 3 r = 0. 



(8.17b) 



J 



where vector j = pv defined is called either the mass flux density or the mass current. Since Eq. (17b) is 
valid for an arbitrary volume, the function under the integral has to vanish at any point: 



Continuity 
equation 



dp 
~dt 



+ V-j = 0. 



(8.18) 



Note that such continuity equation is valid not only for mass, but for other conserved physics 
quantities (e.g., the electric charge, quantum-mechanical probability, etc.), with the proper re-definition 
of pand j. 12 



8.4. Dynamics: Ideal fluids 

Let us start our discussion of fluid dynamics from the simplest case when the stress tensor obeys 
the simple expression (2) even at the fluid motion. Physically, this means that fluid viscosity effects, 
including mechanical energy loss, are negligible. (We will discuss the conditions of this assumption in 
the next section.) Then the equation of motion of such an ideal fluid (essentially the 2 nd Newton law for 
its unit volume) may be obtained from Eq. (7.25) using the simplifications of its right-hand part, 
discussed in Sec. 1: 

pa = -VP + f. (8.19) 



11 If the reader still needs a reminder, see MA Eq. (12.1). 

12 See, e.g., EM Sec. 4.1 and QM Sec. 1.4. 
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Now using the basic kinematic relation (16), we arrive at the following Euler equation:^ 



p — + p(v-V)v = -VP + i 

8t 



(8.20) 



Euler 
equation 



Generally this equation has to be solved together with the continuity equation (11) and equation 
of state of the particular fluid, p = p(P). However, as we have already discussed, in many situations the 
compressibility of water and other important fluids is very low and may be ignored, so that p may be 
treated as a given constant. Moreover, in many cases the bulk forces f are conservative and may be 
presented as a gradient of a certain potential function w(r) - the potential energy per unit volume: 

f = -Vii; (8.21) 

for example, for a uniform gravity field, u = pgh. In this case the right-hand part of Eq. (20) becomes - 
V(P + u). For these cases, it is beneficial to recast the left-hand of that equation as well, using the 
following well-know identity of vector algebra 14 



(v-V)v = V 



vx 



(Vxv). 



(8.22) 



As a result, the Euler equation takes the form 



p pvx(Vx v) + V 

8t 



2 \ 



P + u + p- 



= 0. 



(8.23) 



In a stationary flow, the first term of this equation vanishes. If the second term, describing fluid's 
vorticity, is zero as well, then Eq. (23) has the first integral of motion, 



O 2 

P + u + — v = const , 



(8.24) 



Bernoulli 
equation 



called the Bernoulli equation. Numerous examples of application of Eq. (17) to simple problems of 
stationary flow in pipes, in the Earth gravity field (giving u = pgh), should be well known to the reader, 
so I hope I can skip their discussion without much harm. 

In the general case an ideal fluid may have vorticity, so that Eq. (24) is not always valid. 
Moreover, due to absence of viscosity in an ideal fluid, the vorticity, once created, does not decrease 
along the streamline - the fluid particle's trajectory, to which the velocity is tangential in every point. 
Mathematically, this fact is expressed by the following Kelvin theorem: (Vxv)-dA = const along any 
small contiguous group of streamlines crossing an elementary area dA. 15 

In many important cases the vorticity of fluid is negligible. For example, if a solid body of 
arbitrary shape is embedded into an ideal fluid that is uniform (meaning, by definition, that v(r,f) = Vo = 
const) at large distances, its vorticity is zero everywhere. (Indeed, since Vxv at the uniform flow, the 
vorticity is zero at distant points of any streamline, and according to the Kelvin theorem, should equal 



13 It was derived in 1755 by the same L. Euler whose name has already been (reverently) mentioned several times 
in this course. 

14 It readily follows, for example, from MA Eq. (11.6) with g = f = v. 

15 Its proof may be found, e.g., in Sec. 8 of L. Landau and E. Lifshitz, Fluid Mechanics, 2 nd ed., Butterworth- 
Heinemann, 1987. 
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zero everywhere.) In this case the velocity, as any curl-free vector field, may be presented as a gradient 
of some effective potential function, 



v = 



(8.25) 



Such potential flow may be described by a simple differential equation. Indeed, the continuity equation 
(18) for a steady flow of an incompressible fluid is reduced to V-v = 0. Plugging Eq. (25) into this 
relation, we get the scalar Laplace equation, 



vV = o, 



(8.26) 



which should be solved with appropriate boundary conditions. For example, the fluid flow may be 
limited by solid bodies inside which that the fluid cannot penetrate. Then the fluid velocity at these 
boundaries should not have a normal component: 



81 
dn 



= 0. 



(8.27) 



On the other hand, at large distances from the body in question the fluid flow is known, e.g., uniform: 

V^ = -v 0 , atr->oo. (8.28) 

As the reader may already know (for example, from a course of electrodynamics 16 ), the Laplace 
equation (26) is readily solvable analytically in several simple (symmetric) but important situations. Let 
us consider, for example, the case of a round cylinder, with radius R, immersed into a flow with the 
initial velocity Vo perpendicular to the cylinder axis (Fig. 6). 17 




Fig. 8.6. Flow of ideal, incompressible fluid around a round cylinder: (a) equipotential surfaces and 
(b) streamlines. 



16 See, e.g., EM Sees. 2.3 and 2.4. 

17 Evidently, motion of the cylinder, with constant velocity (-Vo), in the otherwise stationary fluid leads to exactly 
the same problem - in the reference frame bound to the moving body. 
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For this problem, it is natural to use cylindrical coordinates with axis z parallel to cylinder's axis. 
In this case the velocity distribution is evidently independent of z, so that we may simplify the general 
expression of the Laplace operator in cylindrical coordinates 18 by taking d/dz = 0. As a result, Eq. (26) is 
reduced to 



1A 
P d P 



P 



d<f\ i aV 



dr 



+ 



p 2 80 2 



= 0, at p > R. 



(8.29) 



(Let me hope that letter p, used here for the magnitude 2D radius-vector p = {x, y}, will not be confused 
with fluid's density - which does not participate in this boundary problem.) The general solution of this 
equation may be obtained using the variable separation method: 19 



0 = a o +b Q \np + Y j {c n cosn(p + s n %mn(pia n p n +b n p "), 



(8.30) 



where coefficients a n and b n have to be found from the boundary conditions (27) and (28). Choosing 
axis x = rcoscp to be parallel to vector Vo (Fig. 6a) we may rewrite these the conditions in the form 



— — = 0, at p = R, 

dp 

<j> — » -v 0 /?cos<p + 0 O , at p » R, 



(8.31) 



(8.32) 



where <fo is an arbitrary constant, which does not affect the velocity distribution, and may be taken for 
zero. The latter condition is incompatible with all terms of Eq. (30) except the term with n = 1 (with s\ = 
0 and c\a\ = - v 0 ) , so it is reduced to 



-v,P + — 

p 



cos (p. 



(8.33) 



Now, plugging this solution into Eq. (31), we get cp x = -v 0 R 2 , so that, finally, 



0 = -v o 



P + 



R 2 ^ 



cos cp. 



(8.34) 



Figure 6a shows the surfaces of constant velocity potential tf>. In order to find the fluid velocity, it 
is easier to rewrite result (34) in the Cartesian coordinates x = pcosq), y = psmcp: 



0 = ~v 0 x 



1 + 



R 



2 A 



= -v Q x 



1 + 



R^ 



x'+y' 



(8.35) 



From this equation, we may readily calculate the Cartesian components v x = - d(j)ldx and v y = - 80/dy of 
the fluid velocity. Figure 6b shows particle streamlines. 20 One can see that the largest potential gradient, 
and hence the maximum speed, is achieved at points near the vertical diameter (p = R, cp=± nil), where 



18 See, e.g., MA Eq. (10.3). 

19 See, e.g., EM Eq. (2. 1 12). Note that the most general solution of Eq. (29) also includes a term proportional to 
<p, but this term should be zero for such a single-valued function as the velocity potential. 
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V = V = - 



80_ 

dx 



R=r, =2v o- 
x=0 



(8.36) 



Now the pressure distribution may now be found from the Bernoulli equation (24). For u(r) = 0, 
it shows that the pressure reaches maximum at the ends of the longitudinal diameter y = 0, while at the 
ends of the transverse diameter x = 0, where the velocity is largest, it is less by 2pv 0 (where p is the 
fluid density again - sorry for the notation jitters!) Note that the distributions of both velocity and 
pressure are symmetric about the transverse axis x = 0, so that the fluid flow does not create any net 
drag force in its direction. This result, which stems from the conservation of the mechanical energy of 
an ideal fluid, remains valid for a solid body of arbitrary shape moving inside an infinite volume of such 
ideal fluid - the so-called D 'Alambert paradox. However, if a body moves near ideal fluid's surface, its 
energy may be transformed into that of surface waves, and the drag becomes possible. 

Speaking about the surface waves in a gravity field 21 , their description is one more classical 
problem of the ideal fluid dynamics. Let us consider an open surface of an ideal fluid of density p in a 
uniform gravity field f = pg = -pgn y - see Fig. 7. If the wave amplitude A is sufficiently small, we can 
neglect the nonlinear term (v-V)v oc A 2 in the Euler equation (13) in comparison with the first term, 
dv/dt, that is linear in A. For a wave with frequency a> and wavenumber k, particle's velocity v = dq/dt is 

2 2 

of the order of coA, so that this approximation is legitimate if co A » k(a>A) , i.e. when 



kA « 1, 



(8.37) 



i.e. when the wave amplitude is much smaller than its wavelength X = 2nlk. By this assumption, we may 
neglect the fluid vorticity effects, and again use Eq. (25) and (for an incompressible fluid) Eq. (26). 




Fig. 8.7. Small ID waves on a surface of 
a deep fluid. Dashed lines show fluid 
particle trajectories. (For clarity, the 
wave amplitude A is strongly 
exaggerated.) 



wave, 22 



Looking for the solution of the Laplace equation (26) in the natural form of a ID sinusoidal 



20 They may be found by integration of the evident equation dyldx = v y (x,y)/v x (x,y). For our simple problem this 
integration may be done analytically, giving the relation y[l - R 2 /(x 2 + y 2 )] = const, where the constant is specific 
for each streamline. 

21 The alternative, historic term "gravity waves" for this phenomenon may nowadays lead to a confusion with the 
relativistic effect of gravity waves (which may propagate in vacuum), whose direct detection is a focus of so 
much current experimental effort. 

22 Such a wave is "plane" only in direction x (perpendicular to the propagation direction z, see Fig. 4). 
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<p = 0(y)e i(kz m \ (8.38) 



we get a simple equation 

d 2 <t> 

dy 2 



k 2 Q> = 0, (8.39) 



with an exponential solution (decaying, as it has to, at y — » - qo) O = O 0 exp{Ay}, so that Eq. (38) 
becomes 

0 = ® o e ky e i(kz - <vt) . (8.40) 

Note that the rate of the wave decay with depth is exactly equal to the wavenumber of its 
propagation along the surface. Because of that, the trajectories of fluid particles are exactly circular. 
Indeed, using Eqs. (25) and (40) to calculate velocity components, 

ky i(kz - cot) dtp i ^ ky i(kz-cot) so ai\ 

v = — - = -k® a e- / e y ' v = — - = -ik® n e J e v ' (8.41) 

} 8y 0 z 8z 0 

we see that they have equal real amplitudes, and are phase-shifted by nil. This result may be spelled out 
even more clearly if we use the velocity definition v = dq/dt to integrate Eqs. (41) in time to recover the 
particle displacement law q(f). Due to the strong inequality (37), the integration may be done at fixed y 
andz: 



k_ 

CO ICO 



q i^^ky^kz-cot) =Ae ky e i{kz- (0 t)^ q ^ = ^ky e i(kz - cot) ^ whereA ^ 0(> ± (g 42) 



Note that the phase of oscillations of v z coincides with that of q y (at the same point). It means, in 
particular, that at wave's top ("crest"), the fluid is moving in the direction of wave propagation - see 
dashed lines in Fig. 7. 

It is remarkable that all this picture follows from the Laplace equation alone! The "only" 
remaining feature to calculate is the dispersion law oo(k), and for that we need to combine Eq. (40) with 
what remains, in our linear approximation, of the Euler equation (23). Using Eq. (25) for vortex-free 
motion, and the bulk force potential u = pgy, we may present Eq. (23) as 



r 



V 



- p ^t + P + pg y) = 0. (8.43) 



This equation means that the function in the parentheses is constant in space; at the surface, it should 
equal to pressure Po above the surface (say, the atmospheric pressure), that we assume to be constant. 
This means that on the surface, the contributions to P that come from the first and the third term in Eq. 
(43), should compensate each other. Let us take the average surface position for y = 0; then the surface 
with waves is described by relation y = q y - see Fig. 7. Due to the strong relation (37), which means k\q y \ 
« 1, we can use Eqs. (40) and (42) with y = 0, so that the above compensation condition is 

-p(-ifflO o y (fe - fl * ) +pg{-i-<$>X i{kz - cot) =0. (8.44) 

V oo J 

This condition is identically satisfied on the whole surface as soon as 
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Surface 
waves' 
dispersion 



co 1 = gk, (8.45) 
giving the dispersion relation we were looking for. 

Looking at this surprisingly simple result (which includes just one constant, g), note, first of all, 
that it does not involved fluid's density. This is not too much surprising, because due to the weak 
equivalence principle, particle masses always drop out of the results of problems involving gravitational 
forces alone. Second, the dispersion law (45) is strongly nonlinear, and in particular does not have the 
acoustic wave limit. This means that the surface wave propagation is strongly dispersive, with the phase 
velocity cdk oc Ilea diverging at co — » 0. This divergence is an artifact of our assumption of the infinite 
fluid thickness. A rather straightforward generalization of the above calculations to a layer of finite 
thickness h, using the additional boundary condition v } \y=.h = 0, yields the following modified dispersion 
relation, 

co 2 =gktanhkh. (8.46) 

It shows that relatively long waves, with X » h, i.e. with kh « 1, propagate without dispersion (i.e. 
have cdk = const = v), with velocity 

v = (gh) 1 ' 2 . (8.47) 

For the Earth oceans, this velocity is rather high, approaching 300 m/s (!) for h = 10 km. This result 
explains, in particular, the very fast propagation of tsunami waves. 

In the opposite limit of very short waves (large k), Eq. (45) also does not give a good description 
of experimental data, due to the effects of surface tension (see Sec. 2 above). It may be shown that their 
account leads (at kh » 1) to the following modification of Eq. (45): 

co 2 =g k + ^. (8.48) 
P 

According to this formula, the surface tension is important at wavelengths smaller than the capillary 
constant a s given by Eq. (11). Much shorter waves, for whom Eq. (48) yields co oc k m , are called 
capillary waves - or just "ripples". 

All these generalizations are still limited to potential forces, and do not allow one to describe 
energy loss, in particular the attenuation of either bulk or surface waves in fluids. For that, as well as for 
the drag force description, we need to proceed to the effects of viscosity. 



8.5. Dynamics: Viscous fluids 

Fluid viscosity of many fluids, at not too high velocities, may be described surprisingly well by 
adding, to the static stress tensor (2), additional components proportional to velocity v = dq/dt: 

a !S .- PS ff ■ ajv). (8.49) 

Since the Hooke law (7.34) has taught us about the natural structure of such a tensor in the case of stress 
proportional to displacement q, we may expect a similar expression with replacement q — > v = dq/dt: 
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? ,-^,,Tr(e)] + 3/^,,Tr(e) 

•3 y v- 5 



(8.50a) 



where are the elements of the symmetrized strain derivative tensor: 



ds , 



dt 



dv , dv , 



K dr f 



■ + ■ 



dr 



(8.50b) 



J J 



Experiment confirms that Eq. (50) gives a good description of the viscosity effects in a broad range of 
isotropic fluids. Coefficient rj is called either the shear viscosity, or the dynamic viscosity, or just 
viscosity, while £ is called the second (or bulk) viscosity. 

In the most frequent case of a virtually incompressible fluid, Tr(u) = d[Tr(s)]/dt = {dVldt)IV= 0, 
so that the term proportional to £ vanishes, and rj is the only important viscosity parameter. 23 Table 1 
shows the approximate values of the viscosity, together with the mass density p, for several common 
fluids. One can see that rj may vary in extremely broad limits; the extreme cases are glasses (somewhat 
counter-intuitively, these amorphous materials are not stable solids even at room temperature, but rather 
may "flow", though extremely slowly, until they eventually crystallize) and liquid helium. 24 



Table 8.1. Important parameters of several representative fluids (approximate values ) 



Fluid 


rj (mPa-s) 


p(kg/m 3 ) 


Glasses (at 300 K) 


10 21 -10 24 


2,200-2,500 


Machine oils SAE 10W - 40 W (at 300 K) 


65-320 


900 


Water (at 300 K) 


0.9 


1,000 


Air(at300K, 10 5 Pa) 


0.018 


1.3 


Liquid helium 4 (at 4.2K, 10 5 Pa) 


0.019 


130 



Incorporating the additional components of q#' to the equation (20) of fluid motion, absolutely 
similarly to how it was done at the derivation of Eq. (7.109) of the elasticity theory, with the account of 
Eq. (16) we arrive at the famous Navier-Stokes equation: 25 




(8.51) 



The apparent simplicity of this equation should not mask an enormous range of phenomena, 
notably including turbulence (see the next section), that are described by it, and the complexity of its 
solutions even for some simple geometries. In most problems interesting for practice the only option is 



Navier- 
Stokes 
equation 



23 Probably the most important effect we miss by neglecting ^is the attenuation of (longitudinal) acoustic waves, 
into which the second viscosity makes a major (and in some cases, the main) contribution. 

24 Actually, at even lower temperatures (for He 4, T < T A « 2.17 K), helium becomes a superfluid, i.e. looses 
viscosity completely, as result of the Bose-Einstein condensation - see, e.g., SM Sec. 3.4. 

25 Named after C.-L. Navier (1785-1836) who had suggested the equation, and G. Stokes (1819-1903) who has 
demonstrated its relevance by solving it for several key situations. 
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to use numerical methods, but due to the large number of parameters (p, rj, £, plus geometrical 
parameters of the involved bodies, plus the distribution of bulk forces f, plus boundary conditions), this 
way is strongly plagued by the "curse of dimensionality" that was discussed in Sec. 4.8. 

Let us see how does the Navier-Stokes equation work, on several simple examples. As the 
simplest case, let us consider the so-called Couette flow caused in an incompressible fluid layer between 
two wide, horizontal plates (Fig. 8) by mutual sliding of the plates with a constant relative velocity v 0 . 




Fig. 8.8. Fhe simplest problem of 
the viscous fluid flow. 



Let us assume a laminar (vorticity-free) fluid flow. (As will be discussed in the next section, this 
assumption is only valid within certain limits.) Then we may use the evident symmetry of the problem, 
to take, in the reference frame shown in Fig. 8, v = n z v(y). Let the bulk forces be vertical, f = n-/, so 
they do not give an additional drive to fluid flow. Then for the stationary flow (dv/dt = 0), the vertical, 
j-component of the Navier-Stokes equation is reduced to the static Pascal equation (3), showing that the 
pressure distribution is not affected by the plate (and fluid) motion. In the horizontal, z-component of the 
equation only one term, V v, survives, so that for the only Cartesian component of velocity we get the 
ID Laplace equation 



d\ 
dy 2 



= 0. 



(8.52) 



In contract to the ideal fluid (see, e.g., Fig. 6b), the relative velocity of a viscous fluid and a solid 
wall it flows by should approach zero at the wall, 26 so that Eq. (52) should be solved with boundary 
conditions 



v = 



0, atj = 0, 
v 0 , &ty = d. 



(8.53) 



Using the evident solution of this boundary problem, v(y) = (y/d)v 0 , illustrated by arrows in Fig. 8, we 
can now calculate the horizontal drag force acting on a unit area of each plate. For the bottom plate, 



F. 



= a 



dv i 



zy I y=0 V ~ y=0 V ' 



(8.54) 



(For the top plate, the derivative dv/dy has the same value, but the sign of dA y has to be changed to 
reflect the direction of the outer normal to the solid surface, so that we get a similar force but with the 



26 This is essentially an additional experimental fact, but may be readily understood as follows. A solid may be 
considered as an ultimate case of a fluid (with infinite viscosity), and the tangential component of velocity should 
be a continuous an interface between two fluids, in order to avoid infinite stress - see Eq. (50). 
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negative sign.) The well-known result (54) is often used, in undergraduate courses, for a definition of 
the dynamic viscosity rj , and indeed shows its physical meaning very well. 

As the next, slightly less trivial example let us consider the so-called Poiseuille problem 21 of the 
relation between the constant external pressure gradient % = -dP/dz applied along a round pipe with 
internal radius R (Fig. 9) and the so-called discharge Q - defined as the mass of fluid flowing through 
pipe's cross-section per unit time. 



higher 
pressure . 








lower 
pressure 








r*, 



Fig. 8.9. The Poiseuille problem. 



Again assuming a laminar flow, we can involve the problem uniformity along the z axis and its 
axial symmetry to infer that v = n z v(p), and P = -%z + j{p, q>) + const (where p = {p, (p) is the 2D radius- 
vector rather than fluid density), so that the Navier-Stokes equation (44) for an incompressible fluid 
(with Vv = 0) is reduced to a 2D Poisson equation 

T]V 2 2 v = -z. (8.55) 

After spelling out the 2D Laplace operator in polar coordinates for our axially-symmetric case dldq> = 0, 
Eq. (55) becomes a simple ordinary differential equation, 

' 1 T-^T L >-'- (8 56) 
p dp dp 

that has to be solved at the segment 0 < p<R, with the following boundary conditions: 

v = 0, at p = R, 

*=0, atp = 0. ( 8 - 57 ) 
dp 

(The latter condition is required by the axial symmetry.) A straightforward double integration yields: 

v = ^(R 2 -p 2 ), (8.58) 
477 

so that the integration of the mass flow density over the cross-section of the pipe, 

R 

Q= \pvd 2 r = 27rp^\(R 2 -p' 2 )p'dp', (8.59) 
a 4? 7 0 

immediately gives us the so-called Poiseuille (or "Hagen-Poiseuille") law for the fluid discharge: 



27 It was solved theoretically by G. Stokes in 1 845 in order to explain Eq. (60) that had been formulated by J. 
Poiseuille in 1840 on the basis of his experimental results. 
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Poiseuille 
law 




(8.60) 



where (sorry!) p is the mass density again. 



y A 




a/2 




Fig. 8.10. Application of the finite- 
difference method with a very coarse 
mesh (with step h = a/2) to the 
problem of viscous fluid flow in a 
pipe with a square cross-section. 



Of course, not for each cross-section shape the 2D Poisson equation (55) is so readily solvable. 
For example, consider a very simple, square-shape cross-section with side a (Fig. 10). For it, it is natural 
to use the Cartesian coordinates, so that Eq. (55) becomes 



8 2 v d 2 v 
■ + - 



dx dy 

and has to be solved with boundary conditions 

v = 0, at x, y = 0,L 



const, for 0 < x, y < a, 



(8.61) 



(8.62) 



For this boundary problem, analytical methods 28 give answers in the form of an infinite series 
that ultimately require computers for their plotting and comprehension. Let me use this pretext to 
discuss how explicitly numerical methods may be used for such problems - or any partial differential 
equations involving the Laplace operator. The simplest of them is the finite-difference method 29 in 
which the function to be calculated, f{r\,r2,...), is represented by its values in discrete points of a 
rectangular grid (frequently called mesh) of the corresponding dimensionality (Fig. 11). 



(a) 



Vt(df/drj). 



h h 



(b) 
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» < 


i — < 


i — 

Fig. 8.11. Idea of the finite- 
difference method in (a) one and 








(b) two dimensions. 



28 For example, the Green's function method (see, e.g., EM Sec. 2.7). 

29 For more details see, e.g., R. J. Leveque, Finite Difference Methods for Ordinary and Partial Differential 
Equations, SIAM, 2007. 
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In Sec. 4.7, we have already discussed how to use such a grid to approximate the first derivative 
see Eq. (4.98). Its extension to the second derivative is straightforward - see Fig. 11a: 



dr 2 



_0_ 
dr, 



v dr u 



df df 



dr 



dr,. 



f^-f f-h 



(8.63) 



The relative error of this approximation is of the order of h d Idrj , quite acceptable in many cases. As a 
result, the left-hand part of Eq. (61), treated on a square mesh with step h (Fig. lib), may be presented 
as the so-called 5-point scheme: 



d 2 v d 2 v 
'ck 2 ~ + ~6y T 



+ - 2v 



• + 



•2v 



+ + v t + v i 



■4v 



(8.64) 



(The generalization to the 7-point scheme, appropriate for 3D problems, is straightforward.) 

Let us apply this scheme to the pipe with the square cross-section, using an extremely coarse 
mesh with step h = all (Fig. 10). In this case the fluid velocity v should equal zero on the walls, i.e. in 
all points of the five-point scheme (Fig. 1 lb) except for the central point (in which velocity is evidently 
the largest), so that Eqs. (61) and (64) yield 30 



0 + 0 + 0 + 0- 4v n 

(a/2) 2 



i.e. v 



~16 77 



(8.65) 



The resulting expression for the maximal velocity is only -20% different from the exact value. 
Using a slightly finer mesh with h = a/4, which gives a readily solvable system of 3 linear equations for 
3 different velocity values (the exercise highly recommended to the reader), brings us within a couple 
percent from the exact result. This shows that such "numerical" methods may be more efficient 
practically than the "analytical" ones, even if the only available tool is a calculator app on your 
smartphone rather than an advanced computer. 

Of course, many practical problems of fluid dynamics do require high-performance computing, 
especially in conditions of turbulence (see the next section) with its complex, irregular spatial-temporal 
structure. In these conditions, the finite-difference approach may become unsatisfactory, because is 
implies the same accuracy of derivative approximation through the whole volume. A more powerful (but 
also much more complex for implementation) approach is the finite-element method in which the 
discrete point mesh is based on triangles with uneven sides, and is (in most cases, automatically) 
generated in accordance with the system geometry - see Fig. 12. Unfortunately I do not have time for 
going into the details of that method, so the reader is referred to the special literature on this subject. 31 

Before proceeding to our next topic, let me note one more important problem that is analytically 
solvable using the Navier-Stokes equation (51): a slow motion of a solid sphere of radius R, with a 
constant velocity vo, through an incompressible viscous fluid - or equivalently, a slow flow of the fluid 



30 Note that value (65) is exactly the same as given for v max = v|^ 0 by the analytical formula (58) for the round 
cross-section with radius R = a/2. This is not an occasional coincidence. The velocity distribution given by (58) is 
a quadratic function of both x and y. For such functions, with all derivatives higher than d 2 /dr 2 being equal to 
zero, equation (64) is exact rather than approximate. 

31 See, e.g., C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, 
Dover, 2009, or T. J. R. Hughes, The Finite Element Method, Dover, 2000. 
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(uniform at large distances) around an immobile sphere. Indeed, in the limit v — > 0, the second term in 
the left-hand part of this equation is negligible (just as at the surface wave analysis in Sec. 3), and the 
equation takes the form 

-VP + r/V 2 \ = 0, (8.66) 

which should be complemented with the incompressibility condition Vv = 0 and boundary conditions 

v = 0, at r = R, 
v — > v 0 , at r — > oo. 



(8.67) 



In spherical coordinates, with the polar axis directed along vector vo, this boundary problem has the 
axial symmetry (so that d\/d<p= 0 and v p = 0), and allows the following analytical solution: 



= v n cosO 



1 



3R R 

■ + 



3 > 



2r 2r : 



= -v n sin 6 



1 



3R R 



3 > 



4r 4r : 



(8.68) 




Fig. 8.12. Typical finite-element mesh generated 
automatically for an object of complex geometry - 
in this case, a plane wing's cross-section. (Figure 
adapted from www.mathworks.com .) 



Calculating pressure from Eq. (66), and integrating it over the surface of the sphere it is now 
straightforward to obtain the famous Stokes formula for the drag force acting on the sphere: 



Stokes 
formula 



F = 6x7] Rv 0 . 



(8.69) 



Historically, this formula has played a pivotal role in the calculation of the fundamental electric charge e 
from R. Milikan's experiments with charged oil drops. 



8.6. Turbulence 

The Stokes formula (69), whose derivation is limited to low velocities at that the nonlinear term 
(v-V)v could be neglected, become invalid if the fluid velocity is increased. For example, Fig. 13 shows 
the drag coefficient defined as 
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F 



(8.70) 



pv;A/2 
Tom the f 

nR% as a function of the so-called Reynolds number? 1 for this particular geometry defined 

pv 0 (2R) pv 0 D 



where A is the cross-section of the body as seen from the fluid flow direction, for a sphere of radius R 
(so that A ~ - n2 1 
as 



Re = 



V 



(8.71) 
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Fig. 8.13. The drag coefficient for a sphere and a round thin disk as functions of the Reynolds number. 
Adapted from F. Eisner, Das Wider standsproblem, Proc. 3 rd Int. Cong. On Appl. Mech., Stockholm, 1931. 



In this notation, the Stokes formula (69) reads C D = 24/Re. One can see this formula is only valid 
at Re « 1, while at larger velocities the drag force becomes substantially higher than that prediction, 
and its dependence on velocity very complicated, so that only its general, semi-quantitative features may 
be readily understood from simple arguments. 33 

The reason for this complexity is a gradual development of very intricate, time-dependent fluid 
patterns, called turbulence, rich with vortices - for an example, see Fig. 14. These vortices are 
especially pronounced in the region behind the moving body (so-called wake), while the region before 



32 This notion was introduced in 1851 by the same G. Stokes, but eventually named after O. Reynolds who 
popularized it three decades later. 

33 For example, Fig. 13 shows that, within a very broad range of Reynolds numbers, from ~10 2 to -3x10 5 , Cd for 
sphere is of the order of (and for a flat disk, remarkably close to) unity. This level, i.e., the approximate equality F 
» pv^All, may be understood (in the picture where the object is moved by an external force F with velocity vo 
through a fluid which is initially at rest) as the equality of force's power Fvo and fluid's kinetic energy (pvo 2 /2)V 
created in volume V = vqA in unit time. This relation would be exact if the object gave velocity vo to each and 
every fluid particle its cross-section runs into, for example by dragging all such particles behind itself. In reality, 
much of this kinetic energy goes into vortices - see Fig. 14 and its discussion below. 
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the body is virtually unperturbed. Figure 14 indicates that turbulence exhibits rather different behaviors 
in an extremely broad range of velocities (i.e. values of Re), and sometimes changes rather abruptly - 
see, for example, the significant drag drop at Re ~ 5xl0 5 . 




Fig. 8.14. Snapshot of the turbulent tail (wake) behind a sphere moving in a fluid with a 
high Reynolds number, showing the so-called von Kdrmdn vortex street. A nice animation 
of such a pattern may be found at http://en.wikipedia.org/wiki/Reynolds number . 



In order to understand the conditions of this phenomenon, let us estimate the scale of various 
terms in the Navier-Stokes equation (51) for the generic case of a body with characteristic size / moving 
in an otherwise static, incompressible fluid, with velocity v. In this case the characteristic time scale of 
possible non-stationary phenomena is given by the ratio //v, 34 so that we arrive at the following 
estimates: 

f r/V 2 \ 

(8.72) 

v 

Pg ri — 
I 

(I have skipped term VP, because as we saw in the previous section, in typical fluid flow problems it 
balances the viscosity term, and hence is of the same order of magnitude.) This table shows that relative 
importance of the terms may be characterized by two dimensionless ratios. 35 

The first of them is the so-called Froude number 



Equation term: p(v-V)^ 



2 2 
V V 



Order of magnitude: p — p- 



34 The time scale of some problems may be different from l/v; for example, for forced oscillations of a fluid flow 
it is given by the reciprocal oscillation frequency / For such problems, ratio S = fillv) serves as another, 
independent dimensionless constant, commonly called either the Strouhal number or the reduced frequency. 

35 For substantially compressible fluids (e.g., gases), the most important additional dimensionless parameter is the 
Mach number M = v/v/, where V/ = {Klp) m is the velocity of the longitudinal sound - which is, as we already 
know, the only wave mode possible in an infinite fluid. Especially significant for practice are supersonic effects 
(including the shock wave in the form of the famous Mach cone with half-angle 6u = arcsin M~ l ) which arise at 
M> 1. For a more thorough discussion of these issues, I have to refer the reader to more specialized texts - e.g., 
Chapter IX of the Landau and Lifshitz volume cited above, or Chapter 15 in I. M. Cohen and P. K. Kundu, Fluid 
Mechanics, 4 th ed., Academic Press, 2007 - which is generally a good book on the subject. Another popular, rather 
simple textbook is R. A. Granger, Fluid Mechanics, Dover, 1995. 
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F-^-f, (8.73) 
PS l S 

which characterizes the relative importance of bulk gravity - or, upon an appropriate modification, other 
bulk forces. In most practical problems (with the important exception of surface waves, see Sec. 4 
above) F» 1 , so that the gravity effects may be neglected. 

Much more important is another ratio, the Reynolds number (71), in the general case defined as 





pvl 


7JV/1 2 


V 



(8.74) 



Reynolds 
number 



which is a measure of the relative importance of the fluid particle's inertia in comparison with the 
viscosity effects. 36 Thus, it is not quite surprising that for a sphere, the role of the vorticity-creating term 
(v-V)v becomes noticeable already at Re ~ 1 - see Fig. 13. Much more surprising is the onset of 
turbulence in systems where the laminar (turbulence-free) flow is formally an exact solution to the 
Navier-Stokes equation for any Re. For example, at Re > Re t ~ 2,100 (with I = 2R and v = v max ) the 
laminar flow in a round pipe, described by Eq. (58), becomes unstable, and the resulting turbulence 
decreases the fluid discharge Q in comparison with the Poiseuille law (60). Even more strikingly, the 
critical value of Re is rather insensitive to the pipe wall roughness. 

Since Re » 1 in many real-life situations, 37 turbulence is very important for practice. However, 
despite nearly a century of intensive research, there is no general, quantitative analytical theory of this 
phenomenon, 38 and most results are still obtained either by rather approximate analytical treatments, or 
by the numerical solution of the Navier-Stokes equations using the approaches discussed in the previous 
section, or in experiments (e.g., on scaled models 39 in wind tunnels). 

Unfortunately, due to the time/space restrictions, for a more detailed discussion of these results I 
have to refer the reader to more specialized literature, 40 and will conclude the chapter with a brief 
discussion of just one issue: can the turbulence be "explained by a singe mechanism"? (In other words, 
can it be reduced, at least on a semi-quantitative level, to a set of simpler phenomena that are commonly 
considered "well understood"?) Apparently the answer in no, 41 though nonlinear dynamics of simpler 
systems may provide some useful insights. 



36 Note that the "dynamic" viscosity n participates in this number (and many other problems of fluid dynamics) 
only in the combination rj/p that thereby has deserved a special name of kinematic viscosity. 

37 For example, the values of n and p for water listed in Table 1 imply that for a few-meter object, Re > 1,000 at 
any speed above just ~1 mm/s. 

38 A rare exception is the relatively recent theoretical result by S. Orszag (1971) for the turbulence threshold in a 
flow of an incompressible fluid through a gap of thickness t between two parallel plane walls: Re, « 5,772 (for / = 
t/2, v = v max ). However, this result does not predict the turbulence patterns at Re > Re,. 

39 The crucial condition of correct modeling is the equality of the Reynolds numbers (74) (and if relevant, also of 
the Froude numbers and/or the Mach numbers) of the object of interest and its model. 

40 See, e.g., P. A. Davidson, Turbulence, Oxford U. Press, 2004. 

41 The following famous quote is attributed to W. Heisenberg on his deathbed: "When I meet God, I will ask him 
two questions: Why relativity? And why turbulence? I think he will have an answer for the first question." 
Though probably inaccurate, this story reflects rather well the understandable frustration of the fundamental 
physics community, notable for their reductionist mentality, with the enormous complexity of phenomena which 
obey simple (e.g., Navier-Stokes) equations. 
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At the middle of the past century, the most popular qualitative explanation of turbulence had 
been the formation of an "energy cascade" that would transfer energy from larger to smaller vortices. 
With our background, it is easier to retell that story in the time-domain language (with velocity v serving 
as the conversion factor), using the fact that in a rotating vortex each component of the particle radius- 
vector oscillates in time, so that to some extent the vortex plays the role of an oscillatory motion mode. 
Let us consider the passage of a solid body between the two, initially close, small parts of fluid. The 
body pushes them apart, but after its passage these partial volumes are free to return to their initial 
positions. However, the domination of inertia effects at motion with Re » 1 means that the volumes 
continue to "oscillate" for a while about those equilibrium positions. (Since elementary volumes of an 
incompressible fluid cannot merge, these oscillations take the form of rotating vortices.) 

Now, from Sec. 4.8 we know that intensive oscillations in a system with quadratic nonlinearity, 
in this case provided by the convective term (v-V)v, are equivalent, for small perturbations, to the 
oscillation of the system parameters at the corresponding frequency. On the other hand, the discussion in 
Sec. 5.5 shows that in a system with two oscillatory degrees of freedom, a periodic parameter change 
with frequency co p may lead to non-degenerate parametric excitation of oscillations with frequencies 
a>l2 satisfying relation a>\ + o>i = co p . Moreover, the spectrum of oscillations in such system also has 
higher combinational frequencies such as (co p + a>i), thus pushing the oscillation energy up the 
frequency scale. In the presence of other oscillatory modes, these oscillations may in turn produce, via 
the same nonlinearity, even higher frequencies, etc. In a fluid, the spectrum of these "oscillatory modes" 
(actually, vortex structures) is essentially continuous, so that the above arguments make very plausible a 
sequential transfer of energy to a broad spectrum of modes - whose frequency spectrum is limited from 
above by the energy dissipation due to viscosity. When excited, these modes interact (in particular, 
phase-lock) through system's nonlinearity, creating the complex motion we call turbulence. 

Though not having much quantitative predictive power, such handwaving explanations, which 
are essentially based on the excitation of a large number of effective degrees of freedom, had been 
dominating the fluid dynamics reviews until the mid-1960s. At that point, the discovery (or rather re- 
discovery) of quasi-random motion in classical dynamic systems with just a few degrees of freedom 
altered the discussion substantially. Since this phenomenon, called the deterministic chaos, extends well 
beyond the fluid dynamics, and I will devote to it a separate (albeit short) next chapter, and in its end 
briefly return to the discussion of turbulence. 



8.7. Exercise problems 

8.1 . Pressure P under a free water surface crudely obeys the Pascal law, Eq. (6). Find the first- 
order corrections to this result, due to small compressibility of water. 



8.2 . Find the stationary shape of the open surface of an 
incompressible, heavy fluid rotated about a vertical axis with a constant 
angular velocity co - see Fig. on the right. 



g 
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8.3 . Find the shape of the surface of an incompressible fluid of density SSj 
p near a vertical plane wall, in a uniform gravity field - see Fig. on the right. In 
particular, calculate the height h of liquid's rise at the wall surface as a function 
of the contact angle 6 C . 




8.4 . A solid sphere of radius R is kept in a steady, vorticity-free flow of an ideal incompressible 
fluid, with velocity v 0 . Find the spatial distribution of velocity and pressure, and in particular their 
extremal values. Compare the results with those obtained in Sec. 4 for a round cylinder. 

8.5 . Use the finite-difference approximation of the Laplace operator, with mesh step h = a/4, to 
find the maximum velocity and total mass flow Q of a viscous incompressible fluid through a long pipe 
with a square-shaped cross-section of side a. Compare the results with those described in Sec. 4 for: 

(i) the same problem solved numerically with mesh h = all, and 

(ii) a pipe with circular cross-section of the same area. 



8.6 . A layer, of thickness h, of a heavy, viscous, 
incompressible fluid flows down a long and wide incline plane, 
under its own weight - see Fig. on the right. Find the stationary 
velocity distribution profile, and the total fluid discharge (per unit 
width.) 




8.7 . A massive barge, with a flat 
bottom of area A, floats in shallow water, with 
clearance h « A (see Fig. on the right). 
Calculate the time dependence of barge's 
velocity V{t), and the water velocity profile, 
after the barge's engine has been turned off. 
Discuss the limits of large and small values of 
the dimensionless parameter M/pAh. 





V(t) = ! 




> 


^ m M 
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Chapter 9. Deterministic Chaos 

This chapter gives a very brief review of chaotic phenomena in deterministic maps and dynamic systems 
with and without dissipation, and an even shorter discussion of the possible role of chaos in fluid 
turbulence. 



Logistic 
map 



9.1. Chaos in maps 

Chaotic behavior of dynamic systems 1 (sometimes called the deterministic chaos) has become 
broadly recognized 2 after the publication of a 1963 paper by E. Lorenz who was examining numerical 
solutions of the following system of three nonlinear, ordinary differential equations, 



Lorenz 


9i z 


= a l (q 2 


~9i), 


system 


q 2 


= a 2 q x - 


■92 ~9i9 3 > 




<?3 : 


= q l q 2 - 


a 3 q 3 , 



(9.1) 



as a rudimentary model for heat transfer through a horizontal liquid layer between two solid plates. 
(Experiment shows that if the bottom plate is kept hotter than the top one, the liquid may exhibit 
turbulent convection.) He has found that within a certain range of constants 01,2,3, the solutions of Eq. (1) 
follow complex, unpredictable, non-repeating trajectories in the 3D g-space. Moreover, the resulting 
functions qj(t) (where j = 1, 2,3) are so sensitive to initial conditions q/0) that at sufficiently large times 
t, solutions corresponding to slightly different initial conditions are completely different. 

Very soon it was realized that such behavior is typical for even simpler mathematical objects 
called maps, so that I will start my discussion of chaos from these objects. A ID map is essentially a rule 
for finding the next number q n +\ of a series, in the simplest case using only its last known value q n , in a 
discrete series numbered by integer index n. The most famous example is the so-called logistic map: 3 



9 n+ i =f(9 n ) = r 9„( 1 -9„)- 



(9.2) 



The basic properties of this map may be understood using the (hopefully, self-explanatory) 
graphical presentation shown in Fig. I. 4 One can readily see that at r < 1 (Fig. la) the map rapidly 
converges to the trivial fixed point q (0) = 0, because each next value of q is less than the previous one. 
However, if r is increased above 1 (as in the example shown in Fig. lb), fixed point q^ becomes 
unstable. Indeed, at q„ « 1, map (2) yields q n +i = rq„, so that at r > 1, values q n grow with each 
iteration. Instead of the unstable point q (0} = 0, in the range 1 < r < r\, where r\ = 3, the map has a stable 
fixed point, q {l) , that may be found by plugging this value into both parts of Eq. (2): 



1 In this context, this term is understood as "systems described by deterministic differential equations". 

2 Actually, the notion of quasi-random dynamics due to the exponential divergence of trajectories may be traced 
back at least to (apparently independent) works by J. H. Poincare in 1892 and by J. Hadamard in 1898. Citing 
Poincare, "...it may happen that small differences in the initial conditions produce very great ones in the final 
phenomena. [...] Prediction becomes impossible." 

3 To my knowledge, it was first discussed in detail in 1 976 by R. May, on the basis of simple demographic models 
considered as early as in 1838 by P. Verhulst. 

4 Since the maximum value of function /(g), achieved at q = V%, equals r/4, the mapping may be limited by 
segment x = [0, 1], if parameter r is between 0 and 4. Since all interesting properties of the map, including chaos, 
may be found within these limits, I will focus on this range. 
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q {l) =rq (l) (\-q"), 



(9.3) 



giving q {X) = (1 - 1/r) - see the left branch of the plot shown in Fig. 2. 
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Fig. 9.1. Graphical analysis of the logistic map for: (a) r < 1 and (b) r > 1. 
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Fig. 9.2. Fixed points and 
chaotic regions of the logistic 
map. The plot is adapted from 
http://en.wikipedia.org/wiki/Lo 
gistic map ; a very nice live 
simulation of the map is also 
available on this Web site. 



At r > r\ = 3, the plot gets thicker: here the fixed point q (i) also becomes unstable. To prove that, 
let us take q n = q {Y) + q n , assume that deviation q n from the fixed point </ !) is small, and linearize map 
(3) in q n , just as we repeatedly did for differential equations earlier in this course. The result is 

df 



In 



+ 1 



dq 



q„= r (\-2q (l) )q n =(2-r)q n . 



(9.4) 



It shows that 0 < 2 - r < 1, i.e. 1 < r < 2, deviations q n decrease monotonically. At -1 < 2 - r < 0, i.e. 

in the range 2 < r < 3, the deviation signs alternate but the magnitude still decreases (as in a stable focus 
- see Sec. 4.6). However, at -1 < 2 - r, i.e. r > r\ = 3, the deviations are growing by magnitude, while 
still changing sign, at each step. Since Eq. (2) has no other fixed points, this means that at n — > go, values 
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Feigenbaum 
bifurcation 
sequence 



q n do not converge to one point; rather, within the range r\ < r < r 2 , they approach a limit cycle of 
alternation of two points, q+ 2) and q. 2) that satisfy the following system of algebraic equations 



(9.5) 



(These points are also plotted in Fig. 2, as functions of parameter r.) What has happened at point r\ is 
called the period-doubling bifurcation. The story repeats at r = r 2 = 1 + V6 « 3.45 where the system goes 
from the 2-point limit cycle to a 4-point cycle, then at point r = r^ « 3.54 at that the limit cycle becomes 
consisting of 8 alternating points, etc. Most remarkably, the period-doubling bifurcation points r n , at that 
the number of points in the limit cycle doubles from 2"" 1 points to 2" points, become closer and closer. 
Numerical calculations have shown that these points obey the following asymptotic behavior: 



C_ 

8" 



where r = 3.5699..., 8 = 4.6692. 



(9.6) 



Parameter 8 is called the Feigenbaum constant; for other maps, and some dynamic systems (see the 
next section), period-doubling sequences follow a similar law, but with different parameter 8. 

More important for us, however, is what happens at r > r^. Numerous numerical experiments, 
repeated with increasing precision, 5 have confirmed that here the system is fully disordered, with no 
reproducible limit cycle, though (as Fig. 2 shows) at r « r m , all sequential values q n are still confined to a 
few narrow regions. 6 However, as parameter r is increased well beyond r m , these regions broaden and 
merge. This the so-called full, or well-developed chaos, with no apparent order at all. 7 

The most important feature of chaos (in this and any other system) is the exponential divergence 
of trajectories. For a ID map, this means that even if the initial conditions q\ in two map 
implementations differ by a very small amount Aqi, the difference Aq n between the corresponding 
sequences q n is growing (on the average) exponentially with n. Such exponents may be used to 
characterize chaos. Indeed, let us assume that A^i is so small that TV first values q„ are relatively close to 
each other. Then an evident generalization of the first of Eqs. (4) to an arbitrary point q„ is 

df_ 
dq 



e„ = 



<?=?„ 1 



(9.7) 



Using this result iteratively for TV steps, we get 



Aq N = Aq x Y\ e„ , so that In 



Aq f 



Aq x 



=14 



(9.8) 



5 The reader should remember that just as the usual ("nature") experiments, numerical experiments also have 
limited accuracy, due to unavoidable rounding errors. 

6 The geometry of these regions are essentially fractal, i.e. has a dimensionality intermediate between 0 (which 
any final set of geometric points would have) and 1 (pertinent to a ID continuum). An extensive discussion of 
fractal geometries, and their relation to the deterministic chaos may be found, e.g., in the book by B. B. 
Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman, 1983. 

7 This does not mean that the chaos development is a monotonic function of r. As Fig. 2 shows, within certain 
intervals of this parameter chaos suddenly disappears, being replaced, typically, with a few-point limit cycle, just 
to resume on the other side of the interval. Sometimes (but not always!) the "route to chaos" on the borders of 
these intervals follows the same Feigenbaum sequence of period-doubling bifurcations. 
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Numerical experiments show that in most chaotic regimes, at N — > oo such a sum fluctuates about 
an average, which grows as AN, with parameter 



(9.9) 



called the Lyapunov exponent? being independent on the initial conditions. The bottom panel in Fig. 3 
shows it as a function of the parameter r for the logistic map (2). 




Lyapunov 
exponent 



A 




Fig. 9.3. The Lyapunov exponent for 
the logistic map. Adapted from the 
monograph by Schuster and Just (cited 
below). © Wiley VCH Verlag GmbH & 
Co. KGaA. 



Note that at r < r m , A. is negative, indicating trajectory's stability, besides points r u r 2 , ... where 
A, would become positive if the limit cycle change had not brought it back to the negative territory. 
However, at r > r K , A becomes positive, returning the negative values only in limited intervals of stable 
limit cycles. It is evident that in numerical experiments (which dominate the studies of the deterministic 
chaos) the Lyapunov exponent may be used as a good measure of chaos' "depth". 9 

Despite all the abundance of results published for particular maps, 10 and several interesting 
general observations (like the existence of the Feigenbaum bifurcation sequences), to the best of my 
knowledge nobody can yet predict the patterns like those shown in Fig. 2 and 3, from just looking at the 
map rule itself, i.e. without carrying out actual numerical experiments with in. Unfortunately the 
situation with chaos in other systems is not much better. 



8 After A. Lyapunov (1857-1918), famous for his studies of stability of dynamic systems. 

9 iV-dimensions maps, which relate ^-dimensional vectors rather than scalars, may be characterized by N 
Lyapunov exponents rather than one. In order to have chaotic behavior, it is sufficient for just one of them to 
become positive. For such systems, another measure of chaos, the Kolmogorov entropy, may be more relevant. 
This measure, and its relation with the Lyapunov exponents, are discussed, e.g., in SM Sec. 2.2. 

10 See, e.g., Chapters 2-4 in H. G. Schuster and W. Just, Deterministic Chaos, 4 th ed., Wiley-VCH, 2005, or 
Chapters 8-9 in J. M. T. Thompson and H. B. Stewart, Nonlinear Dynamics and Chaos, 2 nd ed., Wiley, 2002. 
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92. Chaos in dynamic systems 

Proceeding to the discussion of chaos in dynamic systems, it is more natural, with our 
background, to illustrate this discussion not with the Lorenz' system Eqs. (1), but with the system of 
equations describing a dissipative pendulum driven by a sinusoidal external force, which was repeatedly 
discussed in Chapter 4. Introducing two new variables, the normalized momentum p = q/ co 0 and the 

external force's full phase y/= cot, we may rewrite Eq. (4.42) describing the pendulum, 

q + 2Sq + a>Q sin q = f 0 cos cot , (9. 10a) 

in a form similar to Eq. (1), i.e. as a system of three first-order ordinary differential equations: 

q = co 0 p, 

p = -co Q sing -2dp + (/ 0 / (y 0 )cos^, (9.10b) 
iff = CO. 

Figure 4 several results of numerical solution of Eq. (10). 11 In all cases, the internal parameters 
8 and a>o of the system, and the external force amplitude fo are fixed, while the external frequency co is 
gradually changed. For the case shown on the top panel, the system still tends to a stable periodic 
solution, with low contents of higher harmonics. If the external force frequency is reduced by a just few 
percent, the 3 rd subharmonic may be excited. (This effect has already been discussed in Sec. 4.8 - see, 
e.g., Fig. 4.15.) The next panel shows that just a very small further reduction of frequency leads to a new 
tripling of the period, i.e. the generation of a complex waveform with the 9 th subharmonic. Finally, even 
a minor further change of parameters leads to oscillations without any visible period, e.g., chaos. 

In order to trace this transition, direct observation of the oscillation waveforms q(f) is not very 
convenient, and trajectories on the phase plane [q, p] also become messy if plotted for many periods of 
the external frequency. In situations like this, the Poincare (or "stroboscopic") plane, already discussed 
in Sec. 4.6, is much more useful. As a reminder, this is essentially just the phase plane [q, p], but with 
the points highlighted only once a period, e.g., at iff = 2m, with n = 1, 2, ... On this plane, periodic 
oscillations of frequency co are presented just as one fixed point - see, e.g. the top panel in the right 
column of Fig. 4. The beginning of the 3 rd subharmonic generation, shown on the next panel, means 
tripling of the oscillation period, and is reflected on the Poincare plane by splitting the fixed point into 
three. It is evident that this transition is similar to the period-doubling bifurcation in the logistic map, 
besides the fact (already discussed in Sec. 4.8) that in systems with an asymmetric nonlinearity, such as 
the pendulum (10), the 3 rd subharmonic is easier to excite. From this point, the 9 th harmonic generation 
(shown on the 3 rd panel of Fig. 4), i.e. one more splitting of the points on the Poincare plane, may be 
understood as one more step on the Feigenbaum-like route to chaos - see the bottom panel of that figure. 

So, the transition to chaos in dynamic systems may be at least qualitatively similar to than in ID 
maps, with the similar law (6) for the critical values of some parameter r of the system (in Fig. 4, 
frequency co), though generally with a different value of exponent 5. Moreover, it is evident that we can 
always consider the first two differential equations of system (10b) as a 2D map that relates the vector 
{q n +i,p n +i} of the coordinate and velocity, measured at y/ = 2u{n + 1), with the previous value {q n , p„} 



1 1 In the actual simulation, a small term sq, with s « 1 , has been added to the left-hand part of this equation. This 
term slightly somewhat tames the trend of the solution to spread along q axis, and makes the presentation of 
results easier, without affecting the system dynamics too much. 
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of that vector (reached at y/ = 2 m). Unfortunately this similarity also implies that chaos in dynamical 
systems is at least as complex, and it as little understood, as in maps. 
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Fig. 9.4. Oscillations in a pendulum with weak damping, SIcoq = 0.1, driven by a sinusoidal external 
force with a fixed effective amplitude fj coo 2 = 1 , and several close values of the frequency (listed on 
the panels). Left column: oscillation waveforms q(t) recorded after certain initial transient intervals. 
Right column: representations of the same processes on the Poincare plane of variables [p, q]. 
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For example, Fig. 5 shows (a part of) the state diagram of the externally-driven pendulum, with 
the red bar marking the route to chaos traced in Fig. 4, and shading/hatching styles marking different 
regimes. One can that the pattern is at least as complex as that shown in Figs. 2 and 3, and besides a few 
features, 12 is equally unpredictable from the form of the equation. 



CO. 




0.4 0.6 

CO I COr, 



Fig. 9.5. Phase diagram of an externally- driven pendulum with 
S/coq = 0.1. Regions of oscillations with the basic period are not 
shaded. The notation for other regions is as follows. Doted: 
subharmonic generation; cross-hatched: chaos; hatched: chaos 
or basic period (depending on the initial conditions); hatch- 
dotted: basic period or subharmonics. Solid lines show 
boundaries of single-regime regions, while dashed lines are 
boundaries of regions in which several types of motion are 
possible, depending on history. (Figure courtesy V. Kornev.) 



Are there any valuable general results concerning chaos in dynamic systems? The most 
important (though an almost evident) result is that this phenomenon is impossible in any system 
described by one or two first-order differential equations with right-hand parts independent of time. 
Indeed, let us start with a single equation 

q = f(q), (9.11) 
where f{q) is any single-valued function. This equation may be directly integrated to give 



t 



=1 



} dq' 



f{q') 



+ const, 



(9.12) 



showing that the relation between q and t is unique and hence does not leave place for chaos. 
Now, let us explore the system of two such equations: 

k\ =fMx^2\ 

q 2 = i a(0i>0a) '• 



(9.13) 



Consider its phase plane shown schematically in Fig. 6. In a "usual" system, the trajectories approach 
either some fixed point (Fig. 6a) describing static equilibrium, or a limit cycle (Fig. 6b) describing 
periodic oscillations. (Both notions are united by the term attractor, because they "attract" trajectories 
launched from various initial conditions.) However, phase plane trajectories of a chaotic system of 



12 In some cases, it is possible to predict a parameter region where chaos cannot happen, due to lack of any 
instability-amplification mechanism. Unfortunately, typically the analytically predicted boundaries of such region 
form a rather loose envelope of the actual (numerically simulated) chaotic regions. 
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equations that describe real physical variables (which cannot tend to infinity), should be confined to a 
limited phase plane area, and simultaneously cannot start repeating each other. (This topology is 
frequently called the strange attractor) For that, 2D trajectories need to cross - see, e.g., points in Fig. 
6c. 




Fig. 9.6. Attractors in dynamical systems: (a) a fixed point, (b) a limit cycle, and (c) a strange attractor. 



However, in the case described by Eqs. (13), this is clearly impossible, because according to 
these equations, the tangent slope on the phase plane is a unique function of point coordinates^!, qi): 

dq 2 f 2 (qi,q 2 ) 

Thus, in this case the deterministic chaos is impossible. It becomes, however, readily possible if the 
right-hand parts of a system similar to Eq. (13) depend either on other variables of the system or time. 
For example, if we consider the first two differential equations of system (10b), in the case fo = 0 they 
have the structure of the system (13) and hence chaos is impossible, even at S < 0 when (as we know 
from Sec. 4.4) the system allows self-excitation of oscillations - leading to a limit-cycle attractor. 
However, if f 0 ^ 0, this argument does not work any longer and (as we have already seen) the system 
may have a strange attractor - which is, for dynamic systems, a synonym for the deterministic chaos. 
Thus, chaos is possible in dynamic systems that may be described by three or more differential 
equations of the first order. 13 

9.3. Chaos in Hamiltonian systems 

The last analysis is of course valid for Hamiltonian systems, which are just a particular type of 
dynamic systems. However, one may wonder whether these systems, that feature at least one first 
integral of motion, H = const, and hence are more "ordered" than the systems discussed above, can 
exhibit chaos at all. The question is yes, because such systems still can have mechanisms for an 
exponential growth of a small initial perturbation. 

As the simplest way to show it, let us consider a so-called mathematical billiard, i.e. a ballistic 
particle (a "ball") moving freely by inertia on a horizontal plane surface ("table") limited by rigid 
impenetrable walls. In this idealized model of the usual game of billiards, ball's velocity v is conserved 



13 Since a typical dynamic system with one degree of freedom is described by two such equations, the number of 
the first-order equations describing a dynamic system is sometimes called the number of half-degrees of freedom. 
This notion is very useful and popular in statistical mechanics - see, e.g., SM Sec. 2.2 and on. 
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when it moves on the table, and when it runs into a wall, the ball is elastically reflected from it as from a 
mirror, 14 with the reversal of the sign of the normal velocity v„, and conservation of the tangential 
velocity v T , and hence without any loss of its kinetic (and hence the full) energy 

E = H = T = ^=^{vl + vl). (9.15) 

This model, while being a legitimate 2D dynamic system, 15 allows geometric analyses for several simple 
table shapes. The simplest case is a rectangular billiard of area axb (Fig. 7), whose analysis may be 
readily carried out by the replacement of each ball reflection event with the mirror reflection of the table 
in that wall - see dashes lines in panel (a). 



* : (a) (b) 
■ 




Fig. 9.7. Ball motion on 
a rectangular billiard at 

(a) a commensurate, and 

(b) an incommensurate 
launch angle. 



0 a 



Such analysis (left for reader' pleasure :-) shows that if the tangent of the ball launching angle <p 
is commensurate with the side length ratio, 

tanp = ±— (9.16) 
n a 

where n and m are non-negative integers without common integer multipliers, the ball returns exactly to 
the launch point O, after bouncing m times from each wall of length a, and n times from each wall of 
length b. (Red lines in Fig. 7a show an example of such trajectory for n = m = 1, while blue lines, for m 
= 3, n = 1.) Thus the larger is the sum (m + n), the more complex is such closed trajectory - "orbit". 

Finally, if (n + m) — > qo, i.e. tarup and bla are incommensurate (meaning that their ratio is an 
irrational number), the trajectory covers all the table area, and the ball never returns exactly into the 
launch point. Still, this is not the real chaos. Indeed, a small shift of the launch point shifts all the 
trajectory fragments by the same displacement. Moreover, at any time t, each of Cartesian components 
Vj(t) of the ball's velocity (with coordinate axes parallel to the table sides) may take only two values, 
±v,(0), and hence may vary only as much as the initial velocity is being changed. 

In 1963, Ya. Sinai showed that the situation changes completely if an additional wall, in the 
shape of a circle, is inserted into the rectangular billiard (Fig. 8). For most initial conditions, ball's 
trajectory eventually runs into the circle (see the red line on panel (a) as an example), and the further 



14 A more scientific-sounding name for such a reflection is specular (from Latin "speculum" meaning a metallic 
mirror). 

15 Indeed, it is fully described by Lagrangian function L = mv 2 /2 - U(p), with U(p) = 0 for 2D radius-vectors p 
belonging to the table area, and U(p) = +oo outside of the area. 
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trajectory becomes essentially chaotic. Indeed, let us consider ball's reflection from the circle-shaped 
wall - Fig. 8b. Due to the conservation of the tangential velocity, and the sign change of the normal 
velocity component, the reflection obeys the mechanical analog of the Snell law (cf. Fig. 7.12 and its 
discussion): 6 r = 6i. Figure 8b shows that as the result, a small difference 8q> between the angles of two 
close trajectories (as measured in the lab system), doubles by magnitude at each reflection from the 
curved wall. This means that the small deviation grows along the ball trajectory as 

\5<p{N)\ ~ 1^(0)1x2^ = \Scp(0)\e Nln2 , (9.17) 

where TV is the number of reflections from the convex wall. 16 As we already know, such exponential 
divergence of trajectories, with a positive Lyapunov exponent, is the sign of deterministic chaos. 17 



(a) 




(b) 

g> t +Sp 




Fig. 9.8. (a) Motion on a Sinai 
billiard table, and (b) the 
mechanism of the exponential 
divergence of close trajectories. 



The most important new feature of the dynamic chaos in Hamiltonian systems is its dependence 
on initial conditions. (In the systems discussed in the previous two previous sections, that lack the 
integrals of motion, the initial conditions are rapidly "forgotten", and the chaos is usually characterized 
after cutting out the initial transient period - see, e.g., Fig. 4.) Indeed, even a Sinai billiard allows 
periodic motion, along closed orbits, at certain initial conditions - see the blue and green lines in Fig. 8a 
as examples. Thus the chaos "depth" in such systems may be characterized by the "fraction" 18 of the 
phase space of initial parameters (for a 2D billiard, the 3D space of initial values of x, y, and (p) resulting 
in chaotic trajectories. 

This conclusion is also valid for Hamiltonian systems that are met in experiments more 
frequently than the billiards, for example, coupled nonlinear oscillators without damping. Perhaps, the 



16 Superficially, Eq. (17) is also valid for a plane wall, but as was discussed above, a billiard with such walls 
features a full correlation between sequential reflections, so that angle <p always returns to its initial value. In a 
Sinai billiard, such correlation disappears. Because of that, concave walls may also make a billiard chaotic. A 
famous example is the stadium billiard, suggested by L. Bunimovich, with two straight, parallel walls connecting 
two semi-circular, concave walls. Another example, which allows a straightforward analysis, is the Hadamard 
billiard: an infinite (or rectangular) table with non-horizontal surface of negative curvature. 

17 Billiards are also a convenient platform for a discussion of a conceptually important issue of quantum 
properties of classically chaotic systems (sometimes improperly named "quantum chaos"). 

18 Actually, quantitative characterization of the fraction is not trivial, because it may have fractal dimensionality. 
Unfortunately, due to lack of time I have to refer the reader interested in this issue to special literature, e.g., the 
monograph by B. Mandelbrot (cited above) and references therein. 
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earliest and the most popular example is the so-called Henon-Heiles system, 19 which may be descrbed 
by the following Lagrangian function: 



(9.18) 




Henon- 
Heiles It is straightworward to use Eq. (18) to derive the Lagrangian equations of motion, 

system 





= -2eq l q 2 , 


m 2 {q 2 +co 2 2 q 2 ) 


= s{ql -q 2 2 ), 



(9.19) 



and find its first integral of motion (physically, the energy conservation law): 



H = E = ^-[q{ +a> x q x )+^-{q 2 +a, 2 q 2 )+s 



1i 



1 2 

-q 2 
3 



q 2 = const . 



(9.20) 



In the conext of our discussions in Chapter 4 and 5, Eqs. (19) may be readily interpreted as those 
describing two oscillators, with small-oscillation eigenfrequencies <x>\ and a>2, nonlinearly coupled only 
as described by the terms in the right-hand parts of the equations. This means that as the oscillation 
amplitudes A^ 2 , and hence the total energy E of the system, tend to zero, the oscillator subsystems are 
virtually independent, each performing sinusoidal oscillations at its own frequency. This observation 
suggestes a convenient way to depict the system motion. 20 Let us consider a Poincare plane for one of 
the oscillators (say, with coordinate qi), similar to that discussed in Sec. 2 above, with the only 
difference is that (because of the absence of an explicit function of time in system's equations), the 
trajectory on the [q 2 ,q 2 ] plane is highlighted at the moments when q\ = 0. 

Let us start from the limit A^2 — > 0, when oscillations of q2 are virtually sinusoidal. As we 
already know (see Fig. 4.9 and its discussion), if the representation point highlighting was perfectly 
synchronous with frequency ah of the oscillations, there would be only one point on the Poincare plane 
- see, e.g. the right top plane in Fig. 4. However, at the q\ — initiated highlighting, there is not such 
synchronism, so that each period, a different point of the elliptical (at the proper scaling of the velocity, 
circular) trajectory is highlighted, so that the resulting points, for certain initial conditions, reside on a 
circle of radius A%. If we now vary the initial conditions, i.e. redistribute the initial energy between the 
oscillators, but keep the total energy E constant, on the Poincare plane we get a series of ellipses. 

Now, if the initial energy is increased, nonlinear interaction of the oscillations start to deform 
these ellipses, causing also their crossings - see, e.g., the top left panel of Fig. 9. Still, below a certain 
threshold value of E, all Poincare points belonging to a certain initial condition sit on a single closed 



19 It was first studied in 1964 by M. Henon and C. Heiles as a simple model of star rotation about a gallactic 
center. Most studies of this equation have been carried out for the following particular case: m 2 = 2m\, m\(D\ = 
m 2 (o 2 1 . In this case, introducing new variables x= sq\ y y= sq 2 , and t= G)\t, it is possible to rewrite Eqs. (1 8)-(20) 
in parameter- free forms. All the results shown in Fig. 9 below are for this case. 

20 Generally, it has a trajectory in 4D space, e.g., that of coordinates qi <2 and their time derivatives, although the 
first integral of motion (20) means that for each fixed energy E, the motion is limited to a 3D sub-space. Still, this 
is too much for convenient representation of the motion. 
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contour. Moreover, these contours may be calculated approximately, but with a pretty good accuracy, 
using a straighforward generalization of the small parameter method discussed in Sec. 4. 2. 21 




e = 



12 



-0.4 -0.3 -0.2 -0,1 0 0.1 0.2 0.3 0.4 0.5 0.6 




0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 




Fig. 9.9. Poincare planes of the Henon- 
Heiles system (19), in notation y = sq 2 , for three 
values of the dimensionless energy e = EIEq, 
with Eq = m\CO\l^. Adapted from M. 
Henon and C. Heiles, The Astron. J. 69, 73 
(1964). ©AAS. 



-&z -O ' i — oT-ttT— a ' a o ' i as as 0.7 d . a o ' a 



However, starting from some value of energy, certain initial conditions lead to series of points 
scattered over final-area parts of the Poincare plane - see the top right panel of Fig. 9. This means that 
the corresponding oscillations ^(0 do not repeat from one (quasi-) period to the next one - cf. Fig. 4 for 
the dissipative, forced pendulum. This is chaos. 22 However, some other initial conditions still lead to 
closed contours. This feature is similar to Sinai billiards, and is typical for Hamiltonian systems. As the 
energy is increased, the larger and larger part of the Poincare plane belongs to the chaotic motion, 
signifying deeper and deeper chaos. 



21 See, e.g., M. V. Berry, in: S. Jorna (ed.), Topics in Nonlinear Dynamics, AIP Conf. Proc. No. 46, AIP, 1978, 
pp. 16-120. 

22 This fact complies with the necessary condition of chaos, discussed in the end of Sec. 2, because Eqs. (19) may 
be rewritten as a system of four differential equations of the first order. 
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9.4. Chaos and turbulence 

This extremely short section consists of essentially just one statement, extending the discussion 
in Sec. 8.5. The (re-) discovery of the deterministic chaos in systems with just a few degrees of freedom 
in the 1960s changed the tone of debates concerning origins of turbulence very considerably. At first, an 
extreme point of view that equated the notions of chaos and turbulence, became the debate's favorite. 23 
However, after an initial excitement, a significant evidence of the Landau-style mechanisms, involving 
many degrees of freedom, has been rediscovered and could not be ignored any longer. To the best 
knowledge of this author, who is a very distant albeit interested observer of that field, most experimental 
and numerical-simulation data carry features of both mechanisms, so that the debate continues. 24 Due to 
the age difference, most readers of these notes have much better chances than the author to see where 
will this discussion end (if it will :-). 25 

9.5. Exercise problems 

9.1 . A dynamic system is described by the following system of ordinary differential equations: 

q x = -q i +a l q 3 2 , 

q 2 =a 2 q 2 - a 3 q 3 2 + a 4 q 2 (1 - q\ ). 
Can it exhibit chaos at some set of constant parameters ap. 



9.2 . A periodic function of time has been added to the right-hand part of the first equation of the 
system considered in the previous problem. Is chaos possible now? 



23 An important milestone on that way was the work by S. Newhouse et al, Comm. Math. Phys. 64, 35 (1978), 
who proved the existence of a strange attractor in a rather abstract model of fluid flow. 

24 See, e.g., U. Frisch, Turbulence: The Legacy of A. N. Kolmogorov, Cambridge U. Press, 1996. 

25 The reader interested in the deterministic chaos as such, may also like to have a look at a very popular book by 
S. Strogatz, Nonlinear Dynamics and Chaos, Westview, 2001. 
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Chapter 10. A Bit More of Analytical Mechanics 

This concluding chapter reviews two alternative approaches to analytical mechanics, whose main 
advantage is a closer parallel to quantum mechanics in general and to its quasiclassical (WKB) 
approximation in particular. One of them, the Hamiltonian formalism, is also used to derive an 
important asymptotic result, the adiabatic invariance, for classical systems with slowly changing 
parameters. 



10.1. Hamilton equations 

Throughout this course we have seen how useful the analytical mechanics, in its Lagrangian 
form, may be invaluable for solving various particular problems of classical mechanics. Now let us 
discuss several alternative formulations 1 that may not be much more useful for this purpose, but shed 
light on possible extensions of classical mechanics, most importantly to quantum mechanics. 

As was already discussed in Sec. 2.3, the partial derivative p. = dhldq j participating in the 
Lagrange equations (2.19) 

d dL dL 



dt ddj dqj 



= 0, 



(10.1) 



may be considered as the generalized momentum corresponding to generalized coordinate qj, and the 
full set of this momenta may be used to define the Hamiltonian function (2.32): 



Hamiltonian 
function 



(10.2) 



Now let us rewrite the full differential of this function 2 in the following form: 



dH = d 



v j 



J i 



= ^Jd(Pj)q j +Pjd(q J )]- 



dL 
dt 



dt+Y, 



dL ... dL . 
— d(qj) + —d(qj) 
dq } d qj 



(10.3) 



According to the definition of the generalized momentum, the second terms of each sum over j cancel, 
while according to the Lagrange equation (1), the derivative dL/ dqj is just p ■ , so that 



dH = ~—dt + X fe j d Pj ~ Pjd<lj ) • 



(10.4) 



So far, this is just a universal identity. Now comes the main trick of Hamilton's approach: let us 
consider H a function of the following independent arguments: time t, the generalized coordinates qj, 



1 Due mostly to W. Hamilton (1805-1865) and C. Jacobi (1804-1851). 

2 Actually, this differential has already been used in Sec. 2.3 to derive Eq. (2.35). 
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and the generalized momenta pj (rather than generalized velocities). With this commitment, the general 
rule of differentiation of a function of several arguments gives 



,„ dH , ^ 

dH = dt+y 

dt *f 



dH , dH , 

dq H dp 

dq j dp. 



(10.5) 



where dt, dqj, and dpj are independent differentials. Since Eq. (5) should be valid for any choice of these 
argument differentials, it should hold in particular if the differentials correspond to the real law of 
motion, for which Eq. (4) is valid as well. The comparison of Eqs. (4) and (5) gives us three relations: 

(10.6) 




(10.7) 



Hamilton 
equations 



Comparing the first of them with Eq. (2.35), we see that 

dH _ dH 

dt dt 



(10.8) 



meaning that function H(t, qj, pj) can change in time only via its explicit dependence on t. Eqs. (7) are 
even more substantial: provided that such function H{t, qj, pj) has been calculated, they give us two first- 
order differential equations (called the Hamilton equations) for the time evolution of the generalized 
coordinate and generalized momentum of each degree of freedom of the system. 3 

Let us have a look at these equations for the simplest case of a system with one degree of 
freedom, with the simple Lagrangian function (3.3): 



q 2 -U ef (q,t) 



(10.9) 



In this case, p = dLI dq = m ef q , and H = pq-L = m e[ q /2 + U et (q,t) . In order to honor our new 
commitment, we need to express the Hamiltonian function explicitly via t, q and p (rather than q ): 



H 



2m 



+ U et (q,t). 



ef 



Now we can spell out Eqs. (7) for this particular case: 

dH 

dp 



q = 



(10.10) 



(10.11) 



in 



ef 



3 Of course, the right-hand part of each equation (7) generally can include coordinates and momenta of other 
degrees of freedom as well, so that the equations of motion for different j are generally coupled. 
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dH 

dq 



dU 



ef 



dq 



(10.12) 



While the first of these equations just repeats the definition of the generalized momentum 
corresponding to coordinate q, the second one gives the equation of momentum change. Differentiating 
Eq. (11) over time, and plugging Eq. (12) into the result, we get: 



1 8U 



ef 



in 



ef 



m 



ef 



dq 



(10.13) 



So, we have returned to the same equation (3.4) that had been derived from the Lagrangian approach. 

Thus, the Hamiltonian formalism does not give much new for the solution of most problems of 
classical mechanics. (This is why I have postponed its discussion until the very end of this course.) 
Moreover, since the Hamiltonian function H(t, qj, pj) does nor include generalized velocities explicitly, 
the phenomenological introduction of dissipation in this approach is less straightforward than that in the 
Lagrangian equations whose precursor form (2.17) is valid for dissipative forces as well. However, the 
Hamilton equations (7), which treat the generalized coordinates and momenta in a manifestly symmetric 
way, are aesthetically appealing and heuristically fruitful. This is especially true in the cases where these 
arguments participate in H in a similar way. For example, for the very important case of a dissipation- 
free harmonic oscillator, for which U e f = x e {q /2, Eq. (10) gives the famous symmetric form 



.2 2 



H = 



2m 



ef 



2m 



■ + - 



where a> 0 = 



K 



ef 



ef 



m 



(10.14) 



ef 



The Hamilton equations (7) for this system preserve the symmetry, especially evident if we introduce 
the normalized momentum /> = p/m e fO>o (already used in Sees. 4.3 and 9.2): 



dq 



- <^o A 
dt dt 



-co 0 q. 



(10.15) 



More practically, the Hamilton approach gives additional tools for the search for the integrals of 
motion. In order to see that, let us consider the full time derivative of an arbitrary function f(t, qj, pj): 



dt dt 4* 



df 



+ — Pj 



Bqj 1 8p j 

Plugging in q. and p ■ from the Hamilton equations (7), we get 



(10.16) 



Dynamics 
of arbitrary 
variable 



4L = $L + y 

dt dt i 



' dH df _ dH df ^ 
dpj dqj dqj dpj 



dt 



+ {H,f), 



(10.17) 



where the last term in the right-hand part is the so-called Poisson bracket* that is defined, for two 
arbitrary functions fit, qj, pj) and g(t, qj, pj), as 



Poisson 
bracket 



dp j dqj dp j dqj 



(10.18) 



4 Named after S. P. Poisson - of the Poisson equation and the Poisson statistical distribution fame. 
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From this definition, one can readily verify that besides evident relations {/",/}= 0 and {f, g) = - {g,f], 
the Poisson brackets obey the following important Jacobi identity: 

{f,{g,h}}+{8,{h,f}}+{h,{f,g}} = 0. (10.19) 

Now let us use these relations for a search for integrals of motion. First, equation (17) shows that 
if a function/ does not depend on time explicitly, and 

{H,f}= 0, (10.20) 

then df/dt = 0, i.e. function/ is an integral of motion. 

Moreover, if we already know two integrals of motion, say /and g, then function 

F = {f,g} (10.21) 

is also an integral of motion - the so-called Poisson theorem. In order to prove it, we may use the Jacobi 
identity (19) with h = H. Now using Eq. (17) to express the Poisson brackets {g, H}, {H, g}, and {H,{f, 
g}} = {H, F} via the full and partial time derivatives of functions/, g, and F, we get 



8g dg 



df a/1 dF 8F 



dt dt 



dt dt dt dt 



so that if/ and g are indeed integrals of motion, i.e., df/dt = dg/dt = 0, then 



dF dF 



df 



dg] dF 



dt dt 



dt 



dt 



dt 



dt 



dt 



(10.22) 



(10.23) 



Plugging Eq. (21) into the first term of the right-hand part of this equation, and differentiating it by 
parts, we get dF/dt = 0, i.e. F is indeed an integral of motion as well. 

Finally, one more important role of the Hamilton formalism is that it allows one to trace the 
close connection between the classical and quantum mechanics. Indeed, using Eq. (18) to calculate the 
Poisson brackets of the generalized coordinates and momenta, we readily get 

{ qj ,q f }=0, { Pj , Pr }=0, {</,,/>,.}= -<V (10.24) 
In quantum mechanics, 5 operators of these quantities ("observables") obey commutation relations 

\gj,qj]=0, [pj,Pj]=0, [q^p^ihSjj,, (10.25) 

where the definition of the commutator, \g,f\ = gf — fg, is to a certain extent 6 similar to that (18) of 
the Poisson bracket. We see that the classical relations (24) are similar to quantum-mechanical relations 
(25) if we following parallel has been made: 

(10.26) 




CM o QM 
relation 



5 See, e.g., QM Sec. 2.1. 

6 There is of course a conceptual difference between the "usual" products of function derivatives participating in 
the Poisson brackets, and the operator "products" (meaning their sequential action on a state vector - see, e.g., 
QM Sec. 4.1) forming the commutator. 
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This analogy extends well beyond Eqs. (24)-(25). For example, making replacement (26) in Eq. 
(17), we get 



4T 

dt 



df i 
—+- 
dt h 



H,f\ 



i.e. ih— = ih — + 



f,H 



(10.27) 



which is the correct equation of operator evolution in the Heisenberg picture of quantum mechanics. 7 

This analogy implies, in particular, that the quantum-mechanical operators (and the matrices 
used for their representation in a particular basis) should satisfy the same identities including Eq. (17). 



10.2. Adiabatic invariance 

One more application of the Hamiltonian formalism in classical mechanics is the solution of the 
following problem. 8 Earlier in the course, we already studied some effects of time variation of 
parameters of a single oscillator (Sec. 4.5) and coupled oscillators (Sec. 5.5). However, those 
discussions were focused on the case when the parameter variation frequency is comparable with the 
initial oscillation frequency (or frequencies) of the system. Another practically important case is when 
some system's parameter (let us call it X) is changed much more slowly (adiabatically 9 ), 



«y, (10-28) 



where T is a typical time period of oscillations in the system. Let us consider a ID system whose 

Hamiltonian H(q, p, X) depends on time only via the slow (28) evolution of parameter X = Mf), and 
whose initial energy restricts system's motion to a finite coordinate interval - see Fig. 3.2c. 

Then, as we know from Sec. 3.3, if parameter A is constant, the system performs a periodic 
(though not necessarily sinusoidal) motion back and forth axis q, or, in a different language, along a 
closed trajectory on the phase plane [q, p] - see Fig. I. 10 According to Eq. (8), in this case H is constant 
on the trajectory. (In order to distinguish this particular value from the Hamiltonian function as such, I 
will assume that this constant coincides with the full mechanical energy E, like is does for Hamiltonian 
(10), though this assumption is not necessary for the calculation made below.) 

The oscillation period /"may be calculated as a contour integral along this closed trajectory: 

T 

T =\dt = ^^dq = ^dq. (10.29) 
o dq q 

Using the first of the Hamilton equations (7), we may now present this integral as 



7 See, e.g., QM Sec. 4.6. 

8 Various aspects of this problem and its quantum-mechanical extension were first discussed by L. Le Cornu 
(1895), Lord Rayleigh (1902), H. Lorentz (191 1), P. Ehrenfest (1916), and M. Born and V. Fock (1928). 

9 This term has come from thermodynamics and statistical mechanics, where it implies not only a slow parameter 
variation, but also the thermal insulation of the system - see, e.g., SM Sec. 1.3. Evidently, the latter condition is 
irrelevant in our current context. 

10 In Sec. 4.6, we discussed this plane for the particular case of sinusoidal oscillations - see Fig. 9 
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r = 



1 

mi dp 



4q 



(10.30) 



At each given point q, H = E is a function of p alone, so that we may flip the partial derivative in the 
denominator just as a full derivative, and rewrite Eq. (30) as 



'-ft* 



(10.31) 



For the particular Hamiltonian (10), this relation is immediately reduced to Eq. (3.27) in the form of a 
contour integral: 



r = 



m 



ef 



V 2 j 



1 



[E-U ei (q)] 



(10.32) 



P' 


H(p,q,X) = 








\ \ 0 




> 

9 



Fig. 10.1. Phase-plane representation of periodic 
oscillations of a ID Hamiltonian system, for two 
values of energy (schematically). 



Superficially, it looks that these formulas may be also used to find the motion period change 
when parameter X is being changed adiabatically, for example, by plugging known functions m e f(X) and 
U e f(q, X) into Eq. (32). However, there is no guarantee that energy E in that integral would stay constant 
as the parameter change, and indeed we will see below that this is not necessarily the case. Even more 
interestingly, in the most important case of the harmonic oscillator (U e f = /c e fq 2 /2), whose oscillation 
period T does not depend on E (see Eq. (3.29) and its discussion), its variation in the adiabatic limit (28) 

may be readily predicted: T(X) = 2rta>o(X) = 2n[m e ^X)l K e ^X)] V2 , but the dependence of the oscillation 

energy E (and hence the oscillation amplitude) on X is not immediately obvious. 

In order to address this issue, let us use Eq. (8) (with E = H) to present the energy change with 
X(t), i.e. in time, as 

dE 8H 8H dX 



dt dt 8X dt 



(10.33) 



Since we are interested in a very slow (adiabatic) time evolution of energy, we can average Eq. (33) 
over fast oscillations in the system, for example over one oscillation period T , treating dXIdt as a 
constant during this averaging. 11 The averaging yields 



1 1 This is the most critical point of this proof, because at any finite rate of parameter change the oscillations are, 
strictly speaking, non-periodic. Because of the approximate nature of this conjecture (which is very close to the 
assumptions made at the derivation of the RWA equations in Sec. 4.3), new, more strict (but also much more 
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dE dA dH dA 1 r dH 



dt dt dA 



dt T * dA 



dt. 



(10.34) 



Transforming the time integral to the contour one, just as we did at the transition from Eq. (29) to Eq. 
(30), and using Eq. (31) for 7~, we get 



r dH/dA 
dE _dA^ dHldp 
dt 



dq 



dt 



dp 
dE 



(10.35) 



dq 



At each point q of the contour, H is a function of not only A, but also of p, which may be also A- 
dependent, so that if E is fixed, the partial differentiation of relation E = H over A yields 



dH dH dp n . dHldA dp 
+ — = 0, i.e. = — — 

dA dp dA dH I dp dA 



(10.36) 



Plugging the last relation into Eq.(35), we get 



dp 

dE _ dAl 



dq 



J 8E 



(10.37) 



Since the left-hand part of Eq. (37), and the derivative dA/dt do not depend on q, we may move them 
into the integrals over q as constants, and rewrite that relation as 



dp dE Bp dA \ , 

— + — — dq = 0. 

y dE dt dA dt J 

Now let us consider the following integral over the same phase-plane contour, 



Action 
variable 



(10.38) 



(10.39) 



called the action variable. Just to understand its physical sense, let us calculate J for a harmonic 
oscillator (14). As we know very well from Chapter 4, for such oscillator, q = Acos^V, p = -m^coaAsm^' 
(with *F = <XK)t + const), so that J may be easily expressed either via oscillations' amplitude A, or their 
energy E = H = m e fW l A : ''12: 

1 




$pdq = — ]{-m et co 0 AsmV)d{AcosV) = — ^^A 2 = E 



J = 



2n 



In 



4'=0 



2k 2 



CO,, 



(10.40) 



Returning to the general oscillator with adiabatically changed parameter A, let us use the 
definition of J, Eq. (39), to calculate its time derivative, again taking into account that at each point q of 
the trajectory, p is a function of E and A: 



cumbersome) proofs of Eq. (42) are still being offered in literature - see, e.g., C. Wells and S. Siklos, Eur. J. 
Phys. 28, 105 (2007) and/or A. Lobo et al, Eur. J. Phys. 33, 1063 (2012). 
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j—dq = 
dt 2n J dt 2n 



dJ 1 rdp 1 rf dp dE dp J/O 



+ — — dq. (10.41) 

\8E dt 8 A dt J 



Within the accuracy of our approximation, in which the contour integrals (38) and (41) are calculated 
along a closed trajectory, factor dE/dt is indistinguishable from its time average, and these integrals 
coincide, so that result (38) is applicable to Eq. (41) as well. Hence, we have finally arrived at a very 
important result: at a slow parameter variation, dJIdt = 0, i.e. the action variable remains constant: 



/ = const 



Adiabatic 
(10.42) invanance 



This is the famous adiabatic invariance. 12 In particular, according to Eq. (40), in a harmonic oscillator, 
energy of oscillation changes proportionately to the (slowly changed) eigenfrequency. 

Before moving on, let me briefly note that the adiabatic invariance is not the only application of 
the action variable J. Since the initial choice of generalized coordinates and velocities (and hence the 
generalized momenta) in analytical mechanics is arbitrary (see Sec. 2.1), it is almost evident that J may 
be taken for a new generalized momentum corresponding to a certain new generalized coordinate 0, 13 
and that pair {J, 0} should satisfy the Hamilton equations (7), in particular, 

(10.43) 

dt 8J 

Following the commitment of Sec. 1 (made there for the "old" arguments qj, pj), before the 
differentiation in the right-hand part in Eq. (43), H should be expressed as a function of t, J, and 0. For 
time-independent Hamiltonian systems, H is uniquely defined by J - see, e.g., Eq. (40). Hence the right- 
hand part of Eq. (43) does not depend on either t or 0, so that according to that equation, 0 (called the 
angle variable) is a linear function of time: 

8M 

S = —t + const. (10.44) 
8J 

For a harmonic oscillator, according to Eq. (40), derivative 8H/8J = 8EI8J = a>o = 2nlT,%o that 0 
= coot + const. It may be shown that a more general form of this relation, 

is valid for an arbitrary oscillator described by Eq. (10). Thus, Eq. (44) becomes 

0 = 2;r^ + const. (10.46) 



12 For certain particular oscillators, e.g., a mathematical pendulum, Eq. (42) may be also proved directly - an 
exercise highly recommended to the reader. 

13 This, again, is a plausible argument but not a strict proof. Indeed, though, according to its definition (39), J is 
nothing more than a sum of several (formally, infinite number of) values of momentum p, they are not 
independent, but have to be selected on the same closed trajectory on the phase plane. For more mathematical 
vigor, the reader is referred to Sec. 45 of Mechanics by Landau and Lifshitz (which was repeatedly cited above), 
which discusses the general rules of the so-called canonical transformations from one set of Hamiltonian 
arguments to another one - say from {p, q) to {J, &}. 
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To summarize, for a harmonic oscillator, the angle variable 0 is just the full phase *F that we 
used so much in Ch. 4, while for an arbitrary (nonlinear) ID oscillator, this is a convenient 
generalization of that notion. Due to this reason, variables J and 0 present a convenient tool for 
discussion of certain fine points of dynamics strongly nonlinear oscillators - for whose discussion I, 
unfortunately, do not have time. 14 



Action 



10.3. The Hamilton principle 

Now let me show that the Lagrangian equations of motion, that have been derived in Sec. 2.1 
from the Newton laws, may be also obtained from the so-called Hamilton principle, namely the 
condition of a minimum (or rather an extremum) of the integral called action: 



(10.47) 



where t m { and tf m are, respectively, the initial and final moments of time, at which moments all 
generalized coordinates and velocities are considered fixed (not varied) - see Fig. 2. 





Fig. 10.2. Deriving the Hamilton 
principle. 



The proof of that statement is rather simple. Considering, similarly to Sec. 2.1, a possible virtual 
variation of the motion, described by infinitesimal deviations { Sq^t) , Sq (t) } from the real motion, the 

necessary condition for S to be minimal is 



Hamilton 
principle 



'fin 

SS = \SLdt = 0. 



(10.48) 



where SS and SL are the variations of the action and the Lagrange function, corresponding to the set 
{SqAi) , Sqj(t) }. As has been already discussed in Sec. 2.1, we can use the operation of variation just 

as the usual differentiation (but at fixed time, see Fig. 2.1), swapping these two operations if needed - 
see Fig. 2.3 and its discussion. Thus, we may write 



„ dL dL | ^ dL y-^ dL d _ 

y{d qj dqj J ) ydqj ydqjdt 



(10.49) 



14 See, e.g., Chapter 6 in J. Jose and E. Saletan, Classical Dynamics, Cambridge U. Press, 1998. 
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After plugging the last expression into Eq. (48), we can integrate the second term by parts: 



s S Jfy^sq i dt+y t< t—-sq i dt 



fin 



j da 



= f zl^^+x 

1111 J 



J t- 



dtjj dt 



8L _ 

°q , 



fin 



'fin 



J t. 



dq j 



= 0. 



(10.50) 



Since the generalized coordinates in the initial and final points are considered fixed (not affected 
by the variation), all dqj{t m \) = Sqj(tf m ) = 0, the second term in the right-hand part of Eq. (50) vanishes. 
Multiplying and dividing the last term of that part by dt, we finally get 



t . j V( i j j t ■ 



d_ 
dt 



8L 

ddj 



fin 

*=-/z 



f.„; > 



5L 



8L 



dqA dq. 



Sq;dt = 0. (10.51) 



This relation should hold for an arbitrary set of functions Sq0), and for any time interval, so that it is 
only possible if the expressions in square brackets equal zero for all j, giving us the set of Lagrange 
equations (2.19). So, the Hamilton principle indeed gives the Lagrange equations of motion. 

It is very useful to make the notion of action S, defined by Eq. (47), more transparent by 
calculating it for the simple case of a single particle moving in a potential field that conserves its energy 
E= T+ U. In this case the Lagrangian function L= T - U may be presented as 

L = T-U = 2T-(T + U) = 2T-E = mv 2 -E, (10.52) 

with E = const, so that 

S = ^Ldt = ^mv 2 dt - Et + const. (10.53) 

Presenting the expression under the remaining integral as my-ydt = p-(dr/dt)dt = p-Jr, we finally get 

S = j"p • dr - Et + const = S 0 - Et + const , (10.54) 

where the time-independent integral 

S 0 =\p-dr (10.55) 

is frequently called the abbreviated action. 15 

This expression may be used to establish one more connection between the classical and 
quantum mechanics, now in its Schrodinger picture. Indeed, in the quasiclassical (WKB) approximation 
of that picture 16 a particle of fixed energy is described by a De Broglie wave 



^(r,?) <x exp{/(jk • dr - cot + const)}, 



(10.56) 



15 Please note that despite a close relation between the abbreviated action 5 0 and the action variable / defined by 
Eq. (39), these notions are not identical. Most importantly, J is an integral over a closed trajectory, while 5 0 in 
defined for an arbitrary point of a trajectory. 

16 See, e.g., QM Sec. 2.3. 
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where wavevector k is proportional to the particle's momentum, while frequency a>, to its energy: 

k = f ® = f. (10.57) 
n n 

Plugging these expressions into Eq. (56) and comparing the result with Eq. (54), we see that the WKB 
wavefunction may be presented as 

¥ocexp{zS/4 (10.58) 

Hence the Hamilton's principle (48) means that the total phase of the quasiclassical 
wavefunction should be minimal along particle's real trajectory. But this is exactly the so-called eikonal 
minimum principle well known from the optics (though valid for any other waves as well), where it 
serves to define the ray paths in the geometric optics limit - similar to the WKB approximation 
condition. Thus, the ratio Slh may be considered just as the eikonal, i.e. the total phase accumulation, of 
the de Broigle waves. 17 

Now, comparing Eq. (55) with Eq. (33), we see that the action variable J is just the change of the 
abbreviated action So along a single phase-plane contour (divided by In). This means that in the WKB 
approximation, J is the number of de Broglie waves along the classical trajectory of a particle, i.e. an 
integer value of the corresponding quantum number. If system's parameters are changed slowly, the 
quantum number has to stay integer, and hence / cannot change, giving a quantum-mechanical 
interpretation of the adiabatic invariance. It is really fascinating that a fact of classical mechanics may 
be "derived" (or at least understood) more easily from the quantum mechanics' standpoint. 18 



10.4. The Hamilton- Jacob i equation 

Action S, defined by Eq. (47), may be used for one more formulation of classical mechanics. For 
that, we need one more, different commitment: S to be considered a function of the following 
independent arguments: the final time point ff in (which I will, for brevity, denote as t in this section), and 
the set of generalized coordinates (but not of the generalized velocities!) at that point: 



Hamilton- 
Jacobi 
action 



T 

S - \Ldt = s\t,q j {t)\. 



(10.59) 



Let us calculate a variation of this (essentially, new!) function, resulting from an arbitrary combination 
of variations of final values qj(t) of the coordinates, while keeping t fixed. Formally this may be done by 
repeating the variation calculations described by Eqs. (49)-(52), besides that now variations Sqj at the 
finite point (f) do not necessarily equal zero. As a result, we get 



dt 



8L 



8L 



dq j 



(10.60) 



17 Eq. (58) was the starting point for R. Feynman's development of his path-integral formulation of quantum 
mechanics - see, e.g., QM Sec. 5.3. 

18 As a reminder, we have run into a similar situation at our discussion of the non-degenerate parametric 
excitation in Sec. 5.5. 
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For the motion along the real trajectory, i.e. satisfying the Lagrange equations of motion, the second 
term of this expression equals zero. Hence Eq. (60) shows that, for (any) fixed time t, 



dS DL 



dqj dq. 

But the last derivative is nothing else than the generalized momentum pj - see Eq. (2.31), so that 

8S 



dq. 



= Pj 



(10.61) 



(10.62) 



(As a reminder, both parts of this relation refer to the final moment t of the trajectory.) As a result, the 
full derivative of action S[t, qj(t)] over time takes the form 



dS dS ^ dS . r 

dt ( I i c q, ( I j 



(10.63) 



Now, by the very definition (59), the full derivative dSldt is nothing more that the Lagrange 
function L, so that Eq. (63) yields 



as 

dt 



= L-Y J P j q j - 



(10.64) 



However, according to the definition (2) of the Hamiltonian function H, the right-hand part of Eq. (63) 
is just (-//), so that we get an extremely simply-looking Hamilton- J acobi equation 




Hamilton- 
(10.65) Jacobi 
equation 



This simplicity is, however, rather deceiving, because in order to use this equation for the 
calculation of function S(t, qj) for any particular problem, the Hamiltonian function has to be first 
expressed as a function of time t, generalized coordinates qj, and the generalized momenta pj (which 
may be, according to Eq. (62), presented just as derivatives dSldqj). Let us see how does this procedure 
work for the simplest case of a ID system with the Hamiltonian function given by Eq. (10). In this case, 
the only generalized momentum is p = dS/dq, so that 



P 1 



2m 



ef 



2m 



ef 



ydqj 



+ U ef (q,t), 



(10.66) 



and the Hamilton- Jacobi equation (65) is reduced to a partial differential equation, 

+ U ef (q,t) = 0. 



dS 1 

- + ■ 



dt 2m 



ef 



K dqj 



(10.67) 



Its solution may be readily found in the particular case of time-independent potential energy U e f 
U e f (q). In this case, Eq. (67) is evidently satisfied by a variable-separated solution 



S(t, q) = S 0 (q) + const x t . 



(10.68) 
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Plugging this solution into Eq. (67), we see that since the sum of two last terms in the left-hand part of 
that equation presents the full mechanical energy E, the constant in Eq. (68) is nothing but (-E). Thus for 
function So we get an ordinary differential equation 



-E + 

Integrating it, we get 



1 ( dS 0 ) 



2m ef 



dq 



2 

+ U et (q) = 0. (10.69) 



S 0 = \ {2m ef [E-U e{ (q)]} V 2 dq + const, (10.70) 

so that, finally, the action is equal to 

S =\{2m cf [E-U ef (q)]} V2 dq-Et + const. (10.71) 

For the case of ID motion of a single ID particle, i.e. for q = x, m e f = m, U e f{q) = U(x), this 
solution is just the ID case of the more general Eqs. (54)-(55), which were obtained by a much more 
simple way. (In particular, So is just the abbreviated action.) 

This particular case illustrates that the Hamilton- Jacobi equation is not the most efficient way for 
solution of most practical problems. However, it may be rather useful for studies of certain mathematical 
aspects of dynamics. 19 Moreover, in the 1940s this approach was extended to a completely different 
field - the optimal control theory, in which the role of action S is played by the so-called cost function - 
a certain functional of a dynamic system, that should be minimized by an optical choice of a control 
signal - a function of time that affects system's dynamics. From the point of view of this mathematical 
theory, Eq. (65) is a particular case of a more general Hamilton- J acobi-Bellman equation. 20 



10.5. Exercise problems 



10.1 . Derive the Hamilton equations of motion for the system already 
considered in Problem 2.3 - a fixed-length pendulum hanging from a horizontal 
support whose motion law Xo(t) is fixed. (No vertical plane constraint.) Check that 
the equations are equivalent to those derived from the Lagrangian formalism. 



x 



g 



m 



10.2 . After small oscillations had been initiated in a simple pendulum (Fig. on the 
right), the thread on that the mass is suspended is being pulled up slowly, so that the 
pendulum length I is being reduced. Neglecting dissipation, 




19 See, e.g., Chapters 6-9 in I. C. Percival and D. Richards, Introduction to Dynamics, Cambridge U. Press, 1983. 

20 See, e.g., T. P. Bertsekas, Dynamic Programming and Optimal Control, vols. 1 and 2, Aetna Scientific, 2005 
and 2007. The reader should not be deceived by the unnatural term "dynamic programming" that was invented by 
the founding father of this field, R. Bellman, to lure government bureaucrats into funding his research, which had 
been deemed too theoretical at that time, but now has a broad range of important applications. 
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(i) prove by a direct calculation that the oscillation energy is indeed changing proportionately to 
the oscillation frequency, as it follows from the constancy of the corresponding adiabatic invariant 
(10.40), and 

(ii) find the /-dependence of amplitudes of the angular and linear deviations from the 
equilibrium. 
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Chapter 1. Electric Charge Interaction 

This brief chapter describes the basics of electrostatics, the study of interactions between static (or 
slowly moving) electric charges. Much of this material should be known to the reader from his or her 
undergraduate studies; because of that, the explanations will be very succinct. 1 



Coulomb 
law for 
2 point 
charges 



1.1. The Coulomb law 

A serious discussion of the Coulomb law 2 requires a common agreement on the meaning of the 
following notions: 3 

- electric charges qt, as revealed, most explicitly, by experimental observation of electrostatic 
interaction between the charged particles; 

- electric charge conservation, meaning that the algebraic sum of qu of all particles inside any 
closed volume is conserved, unless the charged particles cross the volume's border; and 

- a point charge, meaning the charge of an ultimately small ("point") particle whose position in 
space may be completely described (in a given reference frame) by its radius-vector r = niri + \v2r2 + 
n 3 r 3 , where n, (with 7 = 1,2,3) are unit vectors directed along 3 mutually perpendicular axes, and r ; are 
the corresponding Cartesian components of r. 

I will assume that these notions are well known to the reader - though my strong advice is to give 
some thought to their vital importance. Using them, the Coulomb law for the electrostatic interaction of 
two point charges in otherwise free space may be formulated as follows: 

(1.1) 

where F^- denotes the force exerted on charge number k by charge number k'. This law is certainly very 
familiar to the reader, but several remarks may still be due: 

(i) Flipping indices k and k', we see that Eq. (I) 4 complies with the 3 rd Newton law: the 
reciprocal force is equal in magnitude but opposite in direction: F*-* = -F^-. 

(ii) According to Eq. (1), the magnitude of the force, Fkk; is inversely proportional to the square 
of the distance between the two charges - the well-known undergraduate-level formulation of the 
Coulomb law. 




1 For remedial reading, virtually any undergraduate text on electricity and magnetism may be used; I can 
recommend either the classical text by I. Tamm, Fundamentals of Theory of Electricity, Mir, 1979, or the more 
readily available textbook by D. Griffiths, Introduction to Electrodynamics, 3 rd ed., Prentice-Hall, 1999. 

2 Discovered experimentally in the early 1780s, and formulated in 1785 by C.-A. de Coulomb. 

3 On the top of the more general notions of classical Cartesian space, point particles and forces, which are used 
in classical mechanics - see, e.g., CM Sec. 1.1. (Acronyms CM, SM, and QM refer to other 3 parts of my lecture 
note series. In those parts, this Classical Electrodynamics part is referred to as EM.) 

4 As in all other parts of my lecture notes, chapter numbers are omitted in references to equations, figures, and 
sections within the same chapter. 
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(iii) Since vector (r^ - r k ) is directed from point r k - toward point r k (Fig. 1), Eq. (1) implies that 
charges of the same sign (i.e. with q k q k - > 0) repulse, while those with opposite signs {q k q k - < 0) attract 
each other. 




(iv) Constant /cm Eq. (1) depends on the system of units we use. In the Gaussian units, k is set 
to 1, for the price of introducing a special unit of charge (the statcoulomb) that would fit the 
experimental data for Eq. (1), for force F^- measured in Gaussian units {dynes). On the other hand, in 
the International System ("SI") of units, the charge unit is one coulomb (abbreviated C), 5 close to 3xl0 9 
statcoulombs, and /r is different from unity: 6 

Coulomb 
(1.2) law's 

constant 



Unfortunately, the continuing struggle between zealot proponents of these two systems bears all 
ugly features of a religious war, with a similarly slim chances for any side to win it in any foreseeable 
future. In my humble view, each of these systems has its advantages and handicaps (to be noted on 
several occasions below), and every educated physicist should have no problem with using any of them. 
Following insisting recommendations of international scientific unions, I will mostly use SI units, but 
for readers' convenience, duplicate the most important formulas in the Gaussian units. 

Besides Eq. (1), another key experimental law of electrostatics is the linear superposition 
principle: the electrostatic forces exerted on some point charge (say, q k ) by other charges do not affect 
each other and add up as vectors to form the net force: 

F* =X F «" (1.3) 

k'*k 

where the summation is extended over all charges but q k , and the partial force F kk ' is described by Eq. 
(I). 7 The fact that the sum is restricted to k' ^ k means that a point charge does not interact with itself. 



5 In the formal metrology, one coulomb is defined as the charge carried over by a constant current of one ampere 
(see Ch. 5 for its definition) during one second. 

6 Constant s 0 is called either the electric constant or the free space permittivity; from Eq. (2) with the free-space 
speed of light c « 3xl0 8 m/c, s 0 « 8.85xl0" 12 SI units. For more accurate values of the constants, and their brief 
discussion, see appendix CA: Selected Physical Constants. 

1 Physically this is a very strong statement: it means that Eq. (1) is valid for any pair of charges regardless of 
presence of other charges, i.e. not only in the free space, but in also placed into an arbitrary medium. The apparent 
modification of this relation by conductors (Ch. 2) and dielectrics (Ch. 3) is just the result of appearance of 
additional electric charges within those media. 



=i<rV. 



Chapter 1 



Page 2 of 18 



Essential Graduate Physics 



EM: Classical Electrodynamics 



This fact may look trivial from Eq. (1), whose right-hand part diverges at r k -> r k -, but becomes less 
evident (though still true) in quantum mechanics where the charge of even an elementary particle is 
effectively spread around some volume, together with particle's wavefunction. 8 

Now we may combine Eqs. (1) and (3) to get the following expression for the net force F acting 
on some charge q located at point r: 

^i^rli,f\- 0-4) 

This equation implies that it makes sense to introduce the notion of the electric field at point r, as an 
entity independent of the probe charge q, characterized by vector 

Electric . . TT 

field: E(r) = -, (1.5) 

definition q 

formally called the electric field strength - but much more frequently, just the "electric field". In these 
terms, Eq. (4) becomes 

Coulomb 

law for 1 r — r,, 

system E ( r ) s ^ V- ( L6 ) 

of point 4je 0 , ' # r-rJ 

charges * 1 1 

This concept is so appealing that Eq. (5) is used well beyond the boundaries of free-space electrostatics. 
Moreover, the notion of field becomes virtually unavoidable for description of time-dependent 
phenomena (such as electromagnetic waves), where the electromagnetic field shows up as a specific 
form of matter, with zero rest mass, and hence different from the usual "material" particles. 

Many problems involve many point charges qt : , qk", located so closely that it is possible to 
approximate them with a continuous charge distribution. Indeed, for a group of charges within a very 
small volume d 3 r', with the linear size satisfying strong condition dr « | — iv|, the geometrical factor 
in Eq. (6) is essentially the same. As a result, all these charges may be treated as a single charge dQ(r'). 

■J 

Since this charge is proportional to d r', we can define the local (3D) charge density p (r ') by relation 9 



p{r')d'r' = dQ{r<)= J>,,, (1.7) 

and rewrite Eq. (6) as 



r k ,<=d 3 r' 



E (D = 7— Z^(r')7^r = -— I/*" , (1.8) 



4ot 0 d 3 r . |r-r'| 4flE 0d 3 r . |r-r 



8 Moreover, there are some widely used approximations, e.g., the Kohn-Sham equations in the density functional 
theory of multiparticle systems, which essentially violate this law, thus limiting the accuracy and applicability of 
these approximations - see, e.g., QM Sec. 8.4. 

9 The 2D (areal) charge density cr and ID (linear) density A may be defined absolutely similarly: dQ = od 2 r, dQ = 
Mr. Note that a finite value of cr and A means that the volume density p is infinite in the charge location points; 
for example for a plane z = 0, charged with a constant areal density cr, p= ad(z). 
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i.e. as the integral (over the whole volume containing all essential charges): 



10 




(1.9) 



It is very convenient that Eq. (9) may be used even in the case of discrete point charges, 
employing the notion of Dirac's 5- function, 11 which is a mathematical approximation for a very sharp 
function equal to zero everywhere but one point, and still having a finite (unit) integral. Indeed, in this 
formalism, a set of point charges qt located in points r^> may be presented by the pseudo-continuous 
distribution with density 



Coulomb 
law for 
continuous 
charge 
distribution 



(1.10) 



Plugging this expression into Eq. (9), we come back to the discrete version (6) of the Coulomb law. 



1.2. The Gauss law 

Due to the extension to point ("discrete") charges, it may seem that Eqs. (5) and (9) is all we 
need for solving any problem of electrostatics. In practice, this is not quite true, first of all because the 
direct use of Eq. (9) frequently leads to complex calculations. Indeed, let us consider a very simple 
example: the electric field produced by a spherically-symmetric charge distribution with density p(r'). 
We may immediately use the problem symmetry to argue that the electric field should be also 
spherically-symmetric, with only one component in spherical coordinates: E(r)= E(r)n r where n r = rlr is 
the unit vector in the direction of the field observation point r (Fig. 2). 




Fig. 1.2. One of the simplest problems of 
electrostatics: electric field produced by a 
spherically-symmetric charge distribution. 



Taking this direction as the polar axis of a spherical coordinate system, we can use the evident 
independence of the elementary radial field dE, created by the elementary charge p(r')d 3 r' = p{r')r' sin^ 
dr' dO'dcp', of the azimuth angle q>', and reduce integral (9) to 

E = In f sin 6'd& f r' 2 dr' ^Q-cos 6, (1.11) 



10 Note that for a continuous, smooth charge distribution, integral (9) does not diverge at R = r - r' — » 0, because 
in this limit the fraction under the integral increases as R' 1 , i.e. slower than the decrease of the elementary volume 
d 3 r', proportional to 7? 3 . 

11 See, e.g., Sec. 14 of the Selected Mathematical Formulas appendix, referred below as MA. 
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where 6* and r" are the geometrical parameters marked in Fig. 2. Since they all may be readily expressed 
via r' and & using auxiliary parameters a and h, 

cos# = ^^, (r") 2 =h 2 +{r-r' cos 6) 2 , a = r'cos6', h = r'smff, (1.12) 
r" 

integral (11) may be eventually reduced to an explicit integral over r' and ff . and worked out 
analytically, but that would require some effort. 

For more complex problem, integral (8) may be much more complex, defying an analytical 
solution. One could argue that with the present-day abundance of computers and numerical algorithm 
libraries, one can always resort to numerical integration. This argument may be enhanced by the fact 
that numerical integration is based on the replacement of the integral by a sum, and summation is much 
more robust to (unavoidable) discretization and rounding errors than the finite-difference schemes 
typical for the numerical solution of differential equations. 

These arguments, however, are only partly justified, since in many cases the numerical approach 
runs into a problem sometimes called the curse of dimensionality, in which the last word refers to the 
number of input parameters of the problem to be solved, i.e. the dimensionality of its parameter space. 
Let us discuss this issue, because it is common for most fields of physics and, more generally, any 
quantitative science. 12 

If the number of the parameters of a problem is small, the results of its numerical solution may 
be of the same (and in some sense higher) value than the analytical ones. For example, if a problem has 
no parameters, and its result is just one number (say, 7r/4), this "analytical" answer hardly carries more 
information than its numerical form 2.4674011... Now, if a problem has one input parameter (say, a), 
the result of an analytical approach in most cases may be presented as an analytical function /(a). If the 
function is really simple, called elementary, with many properties well known (say, f{a) = sin a), this 
function gives us virtually everything we want to know. However, if the function is complicated, you 
would need to tabulate it numerically for a set of values of parameter a and possibly present the result as 
a plot. The same results (and the same plot) can be calculated numerically, without using analytics at all. 
This plot may certainly be very valuable, but since the analytical form has a potential of giving you 
more information (say, the values of f(a) outside the plot range, or the asymptotic behavior of the 
function), it is hard to say that the numerics completely beat the analytics here. 

Now let us assume that you have more input parameters. For two parameters (say, a and b), 
instead of one curve you would need a family of such curves for several (sometimes many) values of b. 
Still, the plots sometimes may fit one page convenient for viewing, so it is still not too bad. Now, if you 
have three parameters, the full representation of the results may require many pages (maybe a book) full 
of curves, for four parameters we may speak about several bookshelves, for five parameters something 
like a library, etc. For large number of parameters, typical for many scientific problems, the number of 
points in the parameters space grows exponentially, even the volume of calculations necessary for the 
generation of this data may become impracticable, despite the dirt-cheap CPU time we have now. 

Thus, despite the current proliferation of numerical methods in physics, analytical results have 
an ever-lasting value, and we should try to get them whenever we can. For our current problem of 
finding electric field generated by a fixed set of electric charges, large help comes from the Gauss law. 



12 Actually, the term "curse of dimensionality" was coined in the 1950s by R. Bellman in the context of the 
optimal control theory, and only later spread to other sciences that heavily rely on numerical calculations. 
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Let us consider a single point charge q inside a smooth, closed surface A (Fig. 3), and calculate 
product E n d 2 r, where d 2 r is an infinitesimal element of the surface (which may be well approximated 
with a plane of that area), and E„ is the component of the electric field in that point, normal to that 
plane. 




Fig. 1.3. Deriving the Gauss law: a point charge q is (a) inside volume V and (b) outside of that volume. 



This component may be calculated as EcosO, where 0 is the angle between vector E and the unit 
vector n normal to the surface. (Equivalently, E n may be presented as the scalar product E-n.) Now let 

2 2 2 

us notice that the product cos0d r is nothing more than the area d r' of the projection of d r onto the 
plane perpendicular to vector r connecting charge q with this point of the surface (Fig. 3), because the 

2 2 

angle between the planes d r' and d r is also equal to 6. Using the Coulomb law for E, we get 

Ed 2 r = Ecos0d 2 r = - i —^-d 2 r'. (1.13) 

4ns 0 r 2 

2 2 2 

But the ratio d r'/r is nothing more than the elementary solid angle dQ under which the areas d r' and 
d 2 r are seen from the charge point, so that E n d 2 r may be presented as just a product of dQ by a constant 
(q/4nso). Summing these products over the whole surface, we get 

{E n d 2 r = -^-idQ = -^-, (1.14) 

S ^ nc 0 S 0 

since the full solid angle equals 4n. (The integral in the left-hand part of this relation is called the flux of 
electric field through surface S.) 

Equation (14) expresses the Gauss law for one point charge. However, it is only valid if the 
charge is located inside the volume limited by the surface. In order to find the flux created by a charge 
outside of the volume, we still can use Eq. (13), but to proceed we have to be careful with the signs of 
the elementary contributions E n dA. Let us use the common convention to direct the unit vector n out of 
the closed volume we are considering (the so-called outer normal), so that the elementary product E n d 2 r 
= (E-n)J 2 r and hence dQ = E n d 2 r'lr 2 is positive if vector E is pointing out of the volume (like in the 
example shown in Fig. 3a and the upper-right area in Fig. 3b), and negative in the opposite case (for 
example, in the lower-left area in Fig. 3b). As the latter figure shows, if the charge is located outside of 
the volume, for each positive contribution dQ. there is always equal and opposite contribution to the 
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integral. As a result, at the integration over the solid angle the positive and negative contributions cancel 
exactly, so that 



§E n d 2 r = 0. (1.15) 



The real power of the Gauss law is revealed by its generalization to the case of many charges 
within volume V. Since the calculation of flux is a linear operation, the linear superposition principle (3) 
means that the flux created by several charges is equal to the (algebraic) sum of individual fluxes from 
each charge, for which either Eq. (14) or Eq. (15) are valid, depending on the charge position (in or out 
of the volume). As the result, for the total flux we get: 



Gauss 
law 



(1.16) 



where Qv is the net charge inside volume V. This is the full version of the Gauss law. 

In order to appreciate the problem-solving power of the law, let us return to the problem 
presented in Fig. 2, i.e. a spherical charge distribution. Due to its symmetry, which had already been 
discussed above, if we apply Eq. (16) to a sphere of radius r, the electric field should be perpendicular to 
the sphere at each its point (i.e., E n = E), and its magnitude the same at all points: E„ = E = E(r). As a 
result, the flux calculation is elementary: 



§E n d 2 r = 4nr 2 E(r). (1.17) 



Now, applying the Gauss law (16), we get: 

An \ 



1 A 

4nr 2 E(r) = — f p(r')d 3 r' = — f r' 2 p{r')dr', (1.18) 

6 0 r'<r 6 0 0 

so that, finally, 

E(r) = -^-]r' 2 p(r')dr' = 1 Qi ? , (1.19) 

where Q(r) is the full charge inside the sphere of radius r: 

/* 

Q(r) = 47r^p(r'y 2 dr'. (1.20) 

o 

In particular, this formula shows that the field outside of a sphere of a finite radius R is exactly 
the same as if all its charge Q = Q(R) was concentrated in the sphere's center. (Note that this important 
result is only valid for any spherically-symmetric charge distribution.) For the field inside the sphere, 
finding electric field still requires an explicit integration (20), but this ID integral is much simpler than 
the 2D integral (11), and in some important cases may be readily worked out analytically. For example, 
if charge Q is uniformly distributed inside a sphere of radius R, 

p(r') = p = ®= - , (1.21) 

V (4tt/3)R 3 
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the integration is elementary: 



E{r) = J^-\r-dr' = ^ = ^-%. (1.22) 



r s Q J 0 3s 0 4tts 0 R 



We see that in this case the field is growing linearly from the center to the sphere's surface, and only at r 
> R starts to decrease in agreement with Eq. (19) with constant Q{r) = Q. Another important observation 
is that the results for r < R and r > R give the same value {QIAksqR ) at the charged sphere's surface, r = 
R, so that the electric field is continuous. 

In order to underline the importance of the last fact, let us consider one more elementary but very 
important example of the Gauss law's application. Let a thin plane sheet (Fig. 4) be charged uniformly, 
with an areal density <j= const (see Footnote 9 above). 



+ z 



-z 



Fig. 1.4. Electric field of a charged plane. 



E 



In this case, it is fruitful to use the Gauss volume in the form of a planar "pillbox" of thickness 
2z (where z is the Cartesian coordinate perpendicular to charged plane) and certain area A - see Fig. 4. 
Due to the symmetry of the problem, it is evident that the electric field should be: (i) directed along axis 
z, (ii) constant on each of the upper and bottom sides of the pillbox, (iii) equal and opposite on these 
sides, and (iv) parallel to the side surfaces of the box. As a result, the full electric field flux through the 
pillbox surface is just 2AE(z), so that the Gauss law (16) yields 

2AE(z) = —Q A = —oA, (1.23) 

and we get a very simple but important formula 

E(z) = — = const. (1.24) 

Notice that, somewhat counter-intuitively, the field magnitude does not depend on the distance 
from the charged plane. From the point of view of the Coulomb law (5), this result may be explained as 
follows, the farther the observation point from the plane, the weaker the effect of each elementary 
charge, dQ = ad r, but the more such elementary charges give contributions to the vertical component of 
vector E. 

Note also that though the magnitude E = |E I of the electric field is constant, its vertical 
component E z changes sign at z = 0 (Fig. 4), experiencing a discontinuity Gump) equal to AE Z = a/so. 
This jump disappears if the surface is not charged (cr = 0). This statement remains true in a more 
general case of finite volume (but not surface!) charge density p. Returning for a minute to our charged 
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sphere problem, very close to its surface it may be considered planar, so that the electric field should 
indeed be continuous, as it is. 

Admittedly, the integral form (16) of the Gauss law is immediately useful only for highly 
symmetrical geometries, like as in the two problems discussed above. However, it may be recast into an 
alternative, differential form whose field of useful applications is much wider. This form may be 
obtained from Eq. (16) using the divergence theorem that, according to the vector algebra, is valid for 
any space-differentiable vector, in particular E, and for any volume V limited by closed surface S: 13 



§E n d 2 r = \(V -E)d 2 



(1.25) 



where V is the del (or "nabla") operator of spatial differentiation. 14 Combining Eq. (25) with the Gauss 
law (16), we get 



Homo- 
geneous 
Maxwell 
equation 
forE 




P 



dr = 0. 



(1.26) 



~'0J 



For a given distribution of electric charge (and hence of the electric field), this equation should be valid 
for any choice of volume V. This can hold only if the function under the integral vanishes at each point, 
i.e. if 15 



Inhomo- 
geneous 
Maxwell 
equation 
forE 



V-E = 



(1.27) 



Note that in a sharp contrast with the integral form (16), Eq. (26) is local: it relates the electric 
field divergence to the charge density at the same point. This equation, being the differential form of the 
Gauss law, is frequently called (the free-space version of) one of Maxwell equations. Another, 
homogeneous Maxwell equation's "embryo" may be obtained by noticing that curl of point charge's 
field, and hence that of any system of charges, equals zero: 16 



VxE = 0. 



(1.28) 



(We will arrive at two other Maxwell equations, for the magnetic field, in Chapter 5, and then generalize 
all the equations to their full, time-dependent form by the end of Chapter 6. However, Eq. (27) would 
stay the same.) 

Just to get a better gut feeling of Eq. (27), let us apply it to the same example of a uniformly 
charged sphere (Fig. 2). The vector algebra tells us that the divergence of a spherically symmetric vector 
function E(r) = E(r)n r may be simply expressed in spherical coordinates: 17 



13 See, e.g., MA Eq. (12.2). Note that the scalar product under the integral in Eq. (25) is nothing more that the 
divergence of vector E - see, e.g., MA Eq. (8.4). 

14 See, e.g., MA Sees. 8-10. 

15 In the Gaussian units, just as in the initial Eq. (5), so has to be replaced with l/4;r, so that the Maxwell 
equation (27) looks like V-E = Artp, while Eq. (28) stays the same. 

16 This follows, for example, from the direct application of MA Eq. (10.11) to the spherically-symmetric vector 
function f = E(r) = E(r)n r field of a point charge placed at the origin, giving/,? =f <f = 0 and df,Jd6= dfjd<p = 0. 

17 See, e.g., MA Eq. (10.10) for this particular case (when dld6= d/dq>= 0). 
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1 d 



V-E = — — (r'E). (1.29) 
r dr 

As a result, Eq. (27) yields a linear, ordinary differential equation for the function E(r): 



r 2 dr I 0, for r > R 



(r 2 E) = r °' ' (1.30a) 



that may be readily integrated on each of the segments: 

£(r) = ±l \p\Sdr = pr'll + C t , torr<R, 
£ 0 r [C 2 , forr > R. 

In order to determine the integration constant C\, we can use boundary condition E(0) = 0. (It follows 
from problem's spherical symmetry: in the center of the sphere, electric field has to vanish, because 
otherwise, where would it be directed?) Constant C 2 may be found from the continuity condition E(R - 
0) = E(R + 0), which has already been discussed above. As a result, we arrive at our previous results 
(19) and (22). 

We can see that in this particular, highly symmetric case, using the differential form of the Gauss 
law is more complex than its integral form. (For our second example, shown in Fig. 4, it would be even 
less natural.) However, Eq. (27) and its generalizations are more convenient for asymmetric charge 
distributions, and invaluable in the cases where the charge distribution p(r) is not known a priori and 
has to be found in a self-consistent way. (We will start discussing such cases in the next chapter.) 



1.3. Scalar potential and electric field energy 

One more help for solving electrostatics (and more complex) problems may be obtained from the 
notion of the electrostatic potential, which is just the electrostatic potential energy U of a probe particle, 
normalized by its charge: 

77 

(1.31) 

As we know from classical mechanics, 18 the notion of U (and hence <fi) make sense only for the case of 
potential forces, for example those depending just on particle's position. Equations (6) and (8) show 
that, in the static situations, the electric field clearly falls into this category. For such a field, the 
potential energy may be defined as a scalar function U(r) that allows the force to be calculated as its 
gradient (with the opposite sign): 




Electro- 
static 
potential 



-vu 



(1.32) 



Dividing both sides of this equation by the charge of the probe particle, and using Eqs. (5) and (31), we 
get 19 



18 See, e.g., CM Sec. 1.4. 

19 Eq. (28) could be also derived from this relation, because according to vector algebra, any gradient field has 
vanishing curl - see, e.g., MA Eq. (11.1). 
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Electrostatic 
field as a 
gradient 



E = -V0. (1.33) 

In order to calculate the scalar potential, let us start from the simplest case of a single point 
charge q placed at the origin. For it, the Coulomb law (5) takes a simple form 



E = ^-q^ = -^-q^. (1.34) 
47rs n r 4ns n r 



It is straightforward to check that the last fraction in the right-hand part of this equation is equal to 
V(l/r). 20 Hence, according to the definition (33), for this particular case 

Potential of a . ,. ^ 

point charge y = ~ • (1.35) 




(In the Gaussian units, this result is spectacularly simple: <fi = q/r.) Note that we could add an arbitrary 
constant to this potential (and indeed to any other distribution of <fi discussed below) without changing 
the force, but it is convenient to define the potential energy to approach zero at infinity. 

Before going any further, let us demonstrate how useful the notions of U and <fi are, on a very 
simple example. Let two similar charges q be launched from afar, with an initial velocity vo « c each, 
straight toward each other (i.e. with the zero impact parameter) - see Fig. 5. Since, according to the 
Coulomb law, the charges repel each other with increasing force, they will stop at some minimum 
distance r m ; n from each other, and than fly back. 



w " r =? 

m, q 



m,q 

Fig. 1.5. Simple problem of electric particle motion. 



We could of course find r m ; n directly from the Coulomb law. However, for that we would need to 
write the 2 nd Newton law for each particle (actually, due to the problem symmetry, they would be 
similar), then integrate them over time once to find the particle velocity v as a function of distance, and 
then recover r m ; n from the requirement v = 0. The notion of potential allows this problem to be solved in 
one line. Indeed, in the field of potential forces the system's total energy E = T+ U= T+qtp is 
conserved. In our non-relativistic case, the kinetic energy T is just mv 12. Hence, equating the total 
energy of two particles in the points r = oo and r = r m ; n , and using Eq. (35) for (/), we get 

2^ + 0 = 0 + ^-A (1.36) 
2 4^ 0 r min 

2 2 

immediately giving us the final answer: r min = q /47TSomvo . 

Now let us calculate <j) for an arbitrary configuration of charges. For a single charge in an 
arbitrary position (say, r^-), r in Eq. (35) should be evidently replaced for | r - iv | . Now, the linear 



20 This may be done either by Cartesian components or using the well-known expression V/= (df/dr)n r valid for 
any spherically-symmetric scalar function fij) - see, e.g., MA Eq. (10.8) for the particular case 8/89= 8!8<p = 0. 
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superposition principle (3) allows for an easy generalization of this formula to the case of an arbitrary 
set of discrete charges, 



4^o r,,,*r |r-r t , 



(1.37) 



Finally, using the same arguments as in Sec. 1, we can use this result to argue that in the case of an 
arbitrary continuous charge distribution 




Potential 



d 38") of a cnar 9 e 

^ ' ' distribution 



Again, the notion of Dirac's delta-function allows to use the last equation for discrete charges as well, so 
that Eq. (38) may be considered as the general expression for the electrostatic potential. 

For most practical calculations, using this expression and then applying Eq. (33) to the result, is 
preferable to using Eq. (9), because <fi is a scalar, while E is a 3D vector - mathematically equivalent to 3 
scalars. Still, this approach may lead to technical problems similar to those discussed in Sec. 2. For 
example, applying it to the spherically-symmetric distribution of charge (Fig. 2), we get integral 



6 = In f sin &d& f r' 2 dr' ^^cos 6 , 



(1.39) 



which is not much simpler than Eq. (11). 

The situation may be much improved by re-casting Eq. (38) into a differential form. For that, it is 
sufficient to plug the definition of (j), Eq. (33), into Eq. (27): 



V-(-V^) 



P 



(1.40) 



The left-hand part of this equation is nothing more than the Laplace operator of tj> (with the minus sign), 
so that we get the famous Poisson equation 21 for the electrostatic potential: 

(1.41) 

47ip.) This differential equation is so 

rp=0, 

VV = 0, (1.42) 




(In the Gaussian units, the Poisson equation looks like V tj> = ■ 
convenient for applications that even its particular case for p = 0, 



has earned a special name - the Laplace equation. 22 

In order to get a feeling of the Poisson equation as a problem solving tool, let us return to the 
spherically-symmetric charge distribution (Fig. 2) with a constant charge density p. Using the 



Poisson 
equation 
for d> 



Laplace 
equation 
for 6 



21 Named after S. D. Poisson (1781-1840), also famous for the Poisson distribution - one of the central results of 
the probability theory - see, e.g., SM Sec. 5.2. 

22 After mathematician (and astronomer) P. S. de Laplace (1749-1827) who, together with A. Clairault, is credited 
for the development of the very concept of potential. 
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symmetry, we can present the potential as <f(r) = $>), and hence use the following simple expression for 
its Laplace operator: 23 

1 d' 



r 2 dr 



v 



d0 

dr. 



so that for the points inside the charged sphere (r < R) the Poisson equation yields 



J_d_ 
~r 2 ~dr 



d<j) 
dr 



P 



i.e. 



dr 



d<j) 
dr 



^r\ 



(1.43) 



(1.44) 



Integrating the last form of the equation over r once, with the natural boundary condition dtj>ldr \ r = o = 0 
(because of the condition E(0) =0, which has been discussed above), we get 



d<j) 



7 w=-f 

dr re 



-\r' 2 dr' = 

o o 



1 Qr 

4xS 0 R 3 



(1.45) 



Since this derivative is nothing more than -E(r), in this formula we can readily recognize our previous 
result (22). Now we may like to carry out the second integration to calculate the potential itself: 



<Kr) = ^-^\r'dr' + c, = ^^ + c, 

r A-n-r J? 3 J ! 9.^ J? 3 



Qr 2 



4tts 0 R 0 



SttSqR 1 



(1.46) 



Before making any judgment on the integration constant c\, let us solve the Poisson equation (in 
this case, just the Laplace equation) for the range outside the sphere (r > R)\ 

1 d f 



r 2 dr 



v 



d(j) 
~d~r 



= 0. 



Its first integral, 



d<j) 
dr 



r 



(1.47) 



(1.48) 



also gives the electric field (with the minus sign). Now using Eq. (1.45) and requiring the field to be 
continuous at r = R, we get 



Q 



dtp 

i.e. — (r) = 



Q 



R z 4x6 0 R" dr 
in an evident agreement with Eq. (19). Integrating this result again 

Q cdr Q 



47rs 0 r z 



m = 



4ns 0 - /• 



f ar _ 



4ns Q r 



+ c 3 , 



for r > R, 



(1.49) 



(1.50) 



we can select C3 = 0, so that ^oo) = 0, in accordance with the usual (though not compulsory) convention. 
Now we can finally determine constant c\ in Eq. (46) by requiring that this equation and Eq. (50) give 
the same value of <fi at the boundary r = R. (According to Eq. (33), if the potential had a jump, the 
electric field at that point would be infinite.) The final answer may be presented as 



23 See, e.g., MAEq. (10.8) for 8/80= 8ld(p=Q. 
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4x£ 0 R 



n 2 2 

R -r 
2R 2 



+ 1 



for r<R. (1.51) 



We see that using the Poisson equation to find the electrostatic potential distribution for highly 
symmetric problems may be more cumbersome than directly finding the electric field - say, from the 
Gauss law. However, we will repeatedly see below that if the electric charge distribution is not fixed in 
advance, using Eq. (41) may be the only practicable way to proceed. 

Returning now to the general theory of electrostatic phenomena, let us calculate potential energy 
U of an arbitrary system of electric charges qu. Despite the apparently straightforward relation (31) 
between U and <fi, the calculation is a little bit more complex than one might think. Indeed, let us rewrite 
Eqs. (32), (33) for a single charge in the integral form: 

r r 

C/(r) = -[F(r')-dr', i.e. 0(r) = -[E(r')-dr\ (1.52) 

where r 0 is some reference point. These integrals reflect the fact that the potential energy is just the 
work necessary to move the charge from point r 0 to point r, and clearly depend on whether the charge 
motion affects force F (and hence electric field E) or not. If it does not, i.e. if the field is produced by 
some external charges (such fields E ext are also called external), everything is simple indeed: using the 
linearity of relations (31) and (32), for the total potential energy we may write 

r 

^ext =2>^ext(r*)> where ^(r) = -jX xt (r') • dV. (1.53) 

Repeating the argumentation that has led us to Eq. (9), we see that for a continuously distributed charge, 
this sum turns into an integral: 



U ext =\p(r)</> ext (r)d'r. 



(1.54) 



Energy 
in 

external 
field 



However, if the electric field is created by the charges whose energy we are calculating, the 
situation is somewhat different. To calculate U for this case, let us use the fact that it does not depend on 
the way the charge configuration has been created, and consider the following process. First, let us move 
one charged particle (say, q\) from infinity to an arbitrary point of space (ri) in the absence of other 
charges. During the motion the particle does not experience any force (again, the charge does not 
interact with itself!), so that its potential energy is the same as at infinity (with the standard choice of the 
arbitrary constant, zero): U\ = 0. Now let us fix the position of that charge, and move another charge (qi) 
from infinity to point T2 (with velocity v « c, in order to avoid any magnetic field effects, to be 
discussed in Chapter 5.) This particle, during its motion, does experience the Coulomb force exerted by 
fixed q\, so that according to Eq. (31), its contribution to the final potential energy 

U 2 =q 2 Mr 2 )- (1-55) 

Since the first particle was not moving during this process, the total potential energy U of the system is 
equal to just U2. This is exactly the fact that we used writing the right-hand part of Eq. (36). (Prescribing 
a similar energy to charge q\ as well would constitute an error - very popular one, and hence having a 
special name, double-counting.) 
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Now, fixing the first two charges in points ri and r 2 , respectively, and bringing in the third 
charge from infinity, we increment the potential energy by 

U 3 = q 3 \fa(r 3 ) + fa(r 3 )]. (1.56) 

I believe that at this stage it is already clear how to generalize this result to the contribution from an 
arbitrary (k-th) charge being moved in (Fig. 6): 

U k = q k \fa (r k ) + </> 2 (r k ) + (r t ) + ... + fa_ x (r k )] = q k £ fa (r k ) . (1.57) 

k'<k 

(Notice condition k' < k, which suppresses erroneous double-counting.) 



; q 2 ,r 2 



{q k „r k \ k'<k 



from oo 



Fig. 1.6. Deriving the potential energy 
of a system of electric charges. 



Now, summing up all the increments, for the total electrostatic energy of the system we get: 

k k,k' 

(*'<*) 

This is our final result in its generic form; it is so important that is worthy of rewriting it in two other 
forms. First, for its generalization to the continuous charge distribution, we may use Eq. (35) to present 
Eq. (58) in a more symmetric form: 

U=— Xp^. (1.59) 

(k'<k) 

The expression under the sum is evidently symmetric with respect to the index swap, so that it may be 
rewritten in a fully symmetric form, 

U=— IV-M* (1.60) 
4ne 0 2^f r t -r t , 

(*'"*) 

which is now easily generalized to the continuous case: 

1 1 f J3 f j3 „,p(r)p(r') 



U=^^dh\dh-^P^-. (1.61) 
4^ 0 2 J J r-r' 



(As before, in this case the restriction expressed in the discrete charge case as k ^ k' is not important, 
because if the charge density is a continuous function, integral (61) does not diverge at point r = r'.) 
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To present this result in one more form, let us notice that according to Eq. (38), the integral over 
r' in Eq. (61), divided by 4xso, is just the full electrostatic potential at point r, and hence 



U = Up{x)<l>(v)d'r. 



For the discrete charge case, this result becomes 



2 i 



Charge 
(1.62) interaction 
energy 



(1.63) 



but now it is important to remember that the "full" potential's value <ftr k ) should exclude the (infinite) 
contribution of charge k itself. Comparing the last two formulas with Eqs. (52) and (53), we see that the 
electrostatic energy of charge interaction, as expressed via the charge-potential product, is twice less 
than that of charge energy in a fixed ("external") field. This is evidently the result of the self-consistent 
build-up of the electric field as the charge system is being formed. 24 

Now comes an important conceptual question: can we locate this interaction energy in space? 
Expressions (60)-(63) seem to imply that contributions to U come only from the regions where electric 
charges are located. However, one of the beautiful features of physics is that sometimes completely 
different interpretations of the same mathematical result are possible. In order to get an alternative view 
at our current result, let us write Eq. (62) for a volume V so large that the electric field on the limiting 
surface A is negligible, and plug into it the charge density expressed from the Poisson equation (41): 



(1.64) 



This expression may be integrated by parts as 25 



2 



j^(V^)„d 2 r-j"(V^)Vr 



(1.65) 



According to our condition of negligible field E = -V0 on the surface, the first integral vanishes, and we 
get a very important formula 



(1.66) 



This result certainly invites an interpretation very much different than Eq. (62): it is natural to 
represent it in the following form: 



U = \u(r)d 3 r, with u(r) = — £ 2 (r), 



Electric 
(1.67) field 

energy 



24 The nature of this additional factor V2 is absolutely the same as in the well-known formula U = QtyKX 2 for the 
potential energy of an elastic spring providing returning force F = -kx proportional to the deviation x from 
equilibrium. 

25 This transformation follows from the divergence theorem MA (12.2) applied to vector function f = tpVtj), taking 
into account the 3D differentiation rule MA Eq. (1 1.4a): V-(^V$ = (V$-(V$ + ^V-(V$ = (V$ 2 + 0V 2 0. 
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and interpret u(r) as the spatial density of the electric field energy, 26 which is continuously distributed 
over all the space where the field exists - rather than just its part where the charges are located. 

Let us have a look how these two alternative pictures work for our testbed problem, a uniformly 
charged sphere. If we start from Eq. (62), we may limit integration by the sphere volume (0 < r < R) 
where p * 0. Using Eq. (51), and the spherical symmetry of the problem (d 3 r = Aux dr), we get 



1 R 1 O R 
U =-4x[ p6r 2 dr = -4np — - — f 

2 2 H 4ns 0 R{ 



2R' 



• + 1 



r dr = 



1 Q 1 



5 4ne 0 R 2 



(1.68) 



On the other hand, if we use Eq. (67), we need to integrate energy everywhere, i.e. both inside 
and outside of the sphere: 



U = ^4x 
2 



^E 2 r 2 dr + ^E 2 r 2 dr 



(1.69) 



Using Eqs. (19) and (22) for, respectively, the external and internal regions, we get 



U = ^4n 
2 



Qr 
4ns, 



o J 



2 dr + \ 



Q 



4xs 0 r' , 



r dr 



{-A 



1 Q 7 



4xs 0 R 2 



(1.70) 



This is (fortunately :-) the same answer as given by Eq. (68), but to some extent it is more informative 
because it shows how exactly the electric field energy is distributed between the interior and exterior of 
the charged sphere. 27 

We see that, as we could expect, within the realm of electrostatics, Eqs. (62) and (67) are 
equivalent. However, when we examine electrodynamics in Chapter 6 and on, we will see that the latter 
equation is more general, and that it is more adequate to associate energy with the field itself rather than 
its sources - in our current case, electric charges. 



1.4. Exercise problems 

1.1. Calculate the electric field created by a thin, long, straight filament, electrically charged with 
a constant linear density A, using two approaches: 

(i) directly from the Coulomb law, and 

(ii) using the Gauss law. 



1.2 . Can one create the electrostatic fields presented below by sets of their components in 
Cartesian coordinates {x, y, z}, in a finite region of space? 

(i) {yz, xz, xy} 



26 In the Gaussian units, the standard replacement s 0 — > \l4n turns the last of Eqs. (67) into u(r) = E 2 /%7r. 

27 Note that U — > <x> at R — > 0. Such divergence appears at application of Eq. (67) to any point charge. Since it 
does not affect the force acting on the charge, the divergence does not create any technical difficulty for analysis 
of charge statics or nonrelativistic dynamics, but it points to a conceptual problem of classical electrodynamics as 
the whole. This issue will be discussed in the very end of the course (Sec. 10.6). 
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(ii) {xy, xy, yz) 

1.3 . Calculate: + ° A 

(i) the distribution of <fi and E in space, and ^ 

(ii) the electrostatic energy per unit area, ~ ° 

of two thin, parallel planes with equal and opposite charges of constant areal density cr, separated by 
distance d - see Fig. on the right. 

1.4. The system analyzed in Problem 3 (two thin, parallel, oppositely 
charged planes) is placed into an external, uniform, normal electric field E ext 
= of so - see Fig. on the right. Find the forces (per unit area) acting on each 
plane, by two methods: 

(i) directly from the electric field distribution, and 

(ii) from the potential energy (67) of the system. 

1.5 . Explore the relation between the Laplace equation (42) and the condition of minimum of the 
electrostatic field energy (67). 



cr 1 
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Chapter 2. Charges and Conductors 



In this chapter I will start addressing very common situations when the electric charge distribution in 
space is not known a priori, but rather should be calculated in a self-consistent way together with the 
electric field it creates. The simplest situations of this kind involve conductors, and lead to the so-called 
boundary problems in which partial differential equations are solved with appropriate boundary 
conditions. Such problems are also broadly used in other parts of electrodynamics {and indeed in other 
fields of physics as well), so that following tradition, I will use this chapter's material as a playground 
for a discussion of various methods of boundary problem solution, and the special functions most 
frequently encountered on this way. 



The basic principles of electrostatics outlined in Chapter 1 present the conceptually full solution 
for the problem of finding electric field (and hence Coulomb forces) induced by a charge distribution, 
for example, charge density p(r). However, in most practical situation this function is not known but 
should be found self-consistently with the field. The conceptually simplest case of this type arises when 
certain point charges q% are placed near a surface of a good conductor, e.g., a metal: the electric field of 
these charges induces additional charges at conductor's surface, which also contribute to the field. 
Another important type of problems are those without space-positioned charges at all; here only the total 
charges of the involved conductors are fixed, but their spatial distribution inside each conductor has to 
be found. The full solution of such problems, of course, should satisfy Eq. (1.5) for the total field and 
total set of charges. 

To approach the problems, I need to discuss, if only very briefly, 1 the relevant physics of 
conductors. In the simplest macroscopic model, conductors are treated as materials having internal 
charged particles (e.g., electrons in metals) that are free to move under the effect of force - in particular, 
the force F = qE exerted by electric field E. In electrostatics (which specifically excludes the case dc 
current, to be discussed in Chapter 4 below), there should be no such motion, so that everywhere inside 
the conductor the electric field should vanish: 



This is the electric field screening 2 effect. According to Eq. (1.33), this condition may be rewritten in 
another, frequently more convenient form: 



note, however, that if a problem includes several unconnected conductors, the constant in Eq. (lb) may 
be different for each of them. 



1 More detailed discussions may be found, e.g., in Sec. 13.5 of J. Hook and H. Hall, Solid State Physics, 2 nd ed., 
Wiley, 1991, or the section on electric field screening in Chapter 17 of N. Ashcroft and N. Mermin, Solid State 
Physics, Brooks Cole, 1976. 

2 This term, used for electric field, should not be confused with shielding - the word used for the description of 
magnetic field reduction by magnetic materials - see Chapter 5 below. 



2.1. Electric field screening 



E = 0. 



(2.1a) 



(j) = const ; 



(2.1b) 



© 2013 K. Likharev 



Open online access under cc bv-nc-sa license 



Essential Graduate Physics 



EM: Classical Electrodynamics 



Now let us examine what we can say about the electric field outside a conductor, within the same 
macroscopic model. At close proximity, any smooth surface (in our case that of a conductor) looks 
planar. Let us integrate Eq. (1.28) over a narrow (d « P) rectangular loop C encircling a part of such 
plane conductor's surface (see the dashed line in Fig. 1), and apply to the electric field the well-known 
vector algebra equality - the Stokes theorem? 

|(VxE)„J 2 r = |E-Jr, (2.2) 

s c 

where S is the surface limited by contour C, in our case dominated by two straight lines of length /. This 
means that if / is much smaller that the characteristic scale of field change, the right-hand part of Eq. (2) 
equals [{Ei) m - {Ei) ovS \l, where E T is field's component parallel to the surface. On the other hand, 
according to Eq. (1.28), the left-hand side of Eq. (2) equals zero. Hence, E T should be continuous at the 
surface, and in order to satisfy Eq. (la) inside the conductor, immediately outside it, E T = 0 as well. 

► 

E 

► 

► 

/ Fig. 2.1. Electric field near conductor's surface: 
»- E T = 0, E n = a 'sq. 

conductor free space 



Hence, the field just outside the conductor has be normal to its surface. In order to find this 
normal field, let us apply the Gauss law (1.16) to a plane pillbox of area A, similar to the one discussed 
in Sec. 1.2 - see Fig. 1.4. Due to Eq. (1), the total electric flux through the pillbox walls is now {E n ) ouV i, 
so that for this surface field we get 



p 1 

on 



Surface 



(2 3) charge 
v " ' density 



where <j is the areal density of conductor's surface charge. So, the normal component of the field is 
related to the surface charge density by the universal relation (3). 

For the electrostatic potential the macroscopic model provides an even more simple result. 
Indeed, applying the latter of integrals (1.52) to a short path d across the surface normal to it, we see that 
since E n is finite, the potential change Atp vanishes as d — > 0. Hence Eq. (lb) is also valid for potential's 
value immediately outside conductor's surface. 

Before starting to use the macroscopic model for solution of particular problems of electrostatics, 
let us briefly discuss its limitations. Since the argumentation leading to Eq. (3) is valid for any thickness 
d of the Gauss pillbox, within the macroscopic model, the surface charge is located within an infinitely 



3 See, e.g., MA Eq. (12.1). 
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thin surface layer. This is of course impossible physically: for one, this would require an infinite volume 
density p of charge. In reality the charged layer (and hence the region of electric field's crossover from 
the finite value (3) to zero) has a nonvanishing thickness X. At least three effects contribute to X: 

(i) Atomic structure of matter. Within each atom, the electric field does exist and is highly non- 
uniform. Thus Eq. (1) is valid only for the spatial average of the field in a conductor, and cannot be 
taken seriously on the atomic scale a 0 ~ 10" 10 m. 4 

(ii) Thermal excitation. In conductor's bulk, the number of protons of atomic nuclei (n) and 
electrons (n e ) and per unit volume are balanced, so that the net charge density, p= e{n - n e ), vanishes. 5 
However, if an external electric field penetrates a conductor, electrons can shift in or out of its affected 
part, depending on the field addition to their potential energy, AU = q e <fi = -e<j>. (For the sake of notation 
simplicity, here the arbitrary constant in <fi is chosen to give 0 = 0 inside the conductor.) In classical 
statistics, this change is described by the Boltzmann distribution: 6 

«>) = «exp{-^}, (2.4) 



kj 



23 

where k B ~ 1.38x10" J/K is the Boltzmann constant, and T is temperature in SI units (kelvins). As a 
result, the net charge density is 

p(r) = en 



f r ./ v,\ 



If the field did not move the atomic nuclei at all, we could plug the last formula directly into the Poisson 
equation (1.49). Actually, the penetrating electric field shifts the average charge of the nuclei as well. As 
will be discusses in the next chapter, this results in the reduction of the electric field by a media-specific 
dimensionless factor s r (typically not too different from 1), called the dielectric constant. As a result, the 
Poisson equation takes the form, 



d 2 <f) p en 



dz £ r £ 0 S r S 0 



(2.6) 



where we have taken advantage of the ID geometry of the system to simplify the Laplace operator, with 
axis z normal to the surface. Even with this simplification, Eq. (6) is a nonlinear differential equation 
allowing an analytical but rather bulky solution. Since our current goal is just to estimate of the field 
penetration depth X, let us simplify the equation further by considering the low-field limit: e\$ ~ e\E\X 
« k^T. In this limit we can extend the exponent into the Taylor series, and limit ourselves to the two 
leading terms (of which the first one cancels with the unity). As a result, Eq. (4) becomes linear, 

^ = — ^ = \<i>, (2.7) 
dz ££ 0 k B T X D 



4 This scale of course originates from the quantum-mechanical effects of electron motion, characterized by the 
Bohr radius r B « 0.5x1 0" 10 m - see, e.g., QM Eq. (1.13). 

5 Here e denotes the positive fundamental charge, e « 1.6xl0" 19 C (see appendix CA for more exact value), so that 
charge q e of an electron equals (-e). 

6 See, e.g., SM Sec. 3.1. 
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where constant X D is called the Debye screening length: 

2 _ s r s 0 k B T 

A D = 2 

e n 

Equation (7) is easy to solve: it describes an exponential decrease of the electric potential, with 
the characteristic length Xr>: <j> oc exp{-z/2 D }. Plugging in the fundamental constants sq, e, and k B , we get 
the following estimate: Xv[m] » 70 (e r 7TK]/n[m~ 3 ]) 1/2 . According to this formula, in semiconductors at 
room temperature, the Debye length may be rather substantial. For example, in silicon (s r « 12) doped to 

24 3 

the charge carrier concentration n = 3x10 m" (the value typical for modern integrated circuits), 7 Xr> « 2 
nm, still well above the atomic size scale ao. However, for typical good metals (n ~ 10 28 m" 3 , e r ~ 10) 
the same formula gives an estimate X D ~ 4x10" m, less than ar> In this case our calculation should not 
be taken too seriously, because it is based on the assumption of continuous charge distribution on the 
screening length scale. 

(iii) Quantum statistics. Actually, the last estimate is not valid for good metals (and very highly 
doped semiconductors) for one more reason: their free electrons obey quantum (Fermi-Dirac) statistics 
rather that the Boltzmann distribution (4). 8 As a result, at all realistic temperatures they form a 
degenerate quantum gas, occupying all available energy states below certain level <^f » k B T called the 
Fermi energy. In these conditions, the screening of relatively low electric field 9 may be described by 
replacing Eq. (5) with 

p = e(n-n e ) = -eg(t F )(-U) = -e 2 g(t F )</>, (2.9) 

where g{&) is the density of quantum states (per unit volume) at electron's energy 3. At the Fermi 
surface, the density is of the order of n/£v. l ° As a result, instead of Eq. (7) we get a similar differential 
equation, but with a different characteristic scale, defined by the relation 

i2 = £ ,- £ 0 £ r £ 0^F 
— 2 / c \ ~ 2 

e g(£ F ) e n 

and called the Thomas-Fermi screening length. Since for good metals the Fermi energy is of the order of 
a few electron-volts (while the product k B T, replacing in Eq. (8), at T = 300 K is close to just 26 

meV), Eq. (10) typically gives Xjf close to a few ao, and makes the Thomas-Fermi screening theory 
valid at least semi-quantitatively. 

To summarize, the electric field penetration into good conductors is limited to a depth X ranging 
from fractions of a nanometer to a few nanometers, so that for problems with the characteristic size 



7 There is a good reason for making an estimate of X D for this case: the electric field created by the gate electrode 
of a field-effect transistor, penetrating into doped silicon by a depth ~A D , controls current in this most important 
electronic device - on whose back all the current information revolution rides. Because of that, X D establishes the 
possible scale of semiconductor circuit shrinking which is the basis of the well-known Moore's law. (Practically, 
the scale is determined by integrated circuit patterning techniques, and Eq. (8) may be used to find the proper 
charge carrier density n and hence the level of silicon doping.) 

8 See, e.g., SM Sec. 2.8. 

9 Mercifully, in good metals this equation is valid up to very high fields, ~ E F /eA TF ~ 1 0 9 V/m. This value is higher 
than the electric breakdown threshold for vacuum (or air-filled) gaps. 

10 See, e.g., SM Sec. 3.3. 
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much larger than that scale, the macroscopic boundary conditions (1) give a very good accuracy, and we 
will use them in the rest of this chapter. However, the reader should remember that in some situations 
involving semiconductors, as well as at nanoscale experiments with metals, the electric field penetration 
effect should be taken into account. 



2.2. Capacitance 

Let us start with systems consisting of charged conductors alone. Our goal here is calculating the 
distributions of electric field E and potential <fi in space, and the distribution of the surface charge 
density a over the conductor surfaces. However, before doing that for particular situations, let us see if 
there are any integral measures of these distributions, that should be our primary focus. 

The simplest case is of course a single conductor in the otherwise free space. According to Eq. 
(1), all its volume should have a constant electrostatic potential <fi, evidently providing one convenient 
global measure of the situation. Another integral measure is evidently provided by the total charge 



Q = ^pd 3 r = §ad 2 r . 



(2.11) 



Self- 
capacitance 



where the latter integral is extended over the whole surface S of the conductor. In the general case, what 
we can tell about the relation between Q and <jf! At Q = 0, there is no electric field in the system, and it is 
natural (though not necessary) to select the arbitrary constant in the electrostatic potential to have 0=0. 
Then, if the conductor is charged with a finite Q, according to the Coulomb law, the electric field in any 
point of space is proportional to Q. Hence the electrostatic potential everywhere, including its value <fi on 
the conductor, is also proportional to Q: 

<t> = pQ. (2.12) 

The proportionality coefficient p, that depends on the conductor size and shape but not on Q, is called 
the reciprocal capacitance (or, not too often, "electrical elastance"). Usually, Eq. (12) is rewritten in a 
different form, 



(2.13) 



where C is called self-capacitance. (Frequently, C is called just capacitance, but we will soon see that 
for more complex situations the latter term may be too ambiguous.) 

Before going to calculation of C, let us have a look at the electrostatic energy of a single 
conductor. In order to calculate it, of the several equations discussed in Chapter 1, Eq. (1.63) is most 
convenient, because all elementary charges qu are now parts of the conductor surface charge, and hence 
sit at the same potential </>. As a result, the equation becomes very simple: 




I k I 

Moreover, using the linear relation (13), the same result may be re- written in two more forms: 



Electro- 
static 
energy 




(2.14) 



(2.15) 
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We will discuss several ways to calculate C in the next sections, and right now will have a quick 
look at just the simplest example for which we have calculated everything necessary in the previous 
chapter: a conducting sphere of radius R. Indeed, we already know the electric field distribution: 
according to Eq. (1), E = 0 inside the sphere, while Eq. (1.19), with Q{r) = Q, describes the field 
distribution outside it. Moreover, since the latter formula is exactly the same as for the point charge 
placed in the sphere's center, the potential distribution in space can be obtained from Eq. (1.35) by 
replacing q with sphere's full charge Q. Hence, on the surface of the sphere (and, according to Eq. (2), 
through its interior), 

0 = -*—®. (2.16) 
4tts 0 R 

Comparing this result with the definition (13), for the self-capacitance we obtain 11 

C = 4xs 0 R = 2tts 0 D, D = 2R. (2.17) 

This formula, which should be well familiar to the reader, is convenient to get some feeling of 
how large the SI unit of capacitance (1 farad, abbreviated as F) is: the self-capacitance of Earth (R E « 
6.34x1 0 6 m) is below 1 mF! Another important note is that while Eq. (17) is not exactly valid for a 
conductor of arbitrary shape, it implies an important estimate 

C~2ne 0 a (2.18) 

where a is the scale of the linear size of any conductor. 12 

Now proceeding to a system of two conductors, we immediately see why we should be careful 
with the capacitance definition: one constant C is insufficient to describe such system. Indeed, here we 
have two, generally different conductor potentials, </>\ and ^, that may depend on both conductor 
charges, Q\ and Q 2 . Using the same arguments as for the one-conductor case, we may conclude that the 
dependence is always linear: 

A = PuQx +PnQn (2 19) 

02= P21Q1+ P22Q2, 

but still has to be described not with one but with four coefficients pjy = 1, 2) forming the so-called 

reciprocal capacitance matrix 



f Pu Pn^ 

\P2\ P22J 



(2.20) 



Plugging relation (19) into Eq. (1.63), we see that the full electrostatic energy of the system may be 
expressed by a quadratic form: 



11 In the Gaussian units, using the standard replacement Aksq — > 1, this relation takes a remarkably simple form: C 
= R, good to remember. Generally, in the Gaussian units (but not in the SI system!) the capacitance has the 
dimensionality of length, i.e. is measured in centimeters. Note also that a convenient fractional SI unit, 1 picofarad 
(10" 12 F) is very close to the Gaussian unit: 1 pF = (Ixl0" 12 )/(4^ 0 xl0" 2 ) ~ 0.8998 cm. 

12 These arguments are somewhat insufficient to say which size should be used for a in the case of narrow, 
extended conductors, e.g., a thin, long wire of length L and diameter D « L. In the Very soon we will see that in 
such cases the electrostatic energy, and hence C, should mostly depend on the larger size of the conductor. 
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(2.21) 



2 2 2 

It is evident that the middle term in the right-hand part of this equation describes the electrostatic 
coupling of the conductors. (Without it, the energy would be just a sum of two independent electrostatic 
energies of conductors 1 and 2.) This is why systems with \pn\, \p2i\« Pu, P22 are called weakly 
coupled, and may be analyzed using approximate methods - see, e.g., Fig. 3 and its discussion below. 

Before proceeding further, let us use the Lagrangian formalism of analytical mechanics 13 to 
argue that the off-diagonal elements of matrix pjj- are always equal: 

Pn=Pzv (2-22) 

1,2) of the system; then the 



Indeed, charges may be taken for generalized coordinates qj (j 
corresponding generalized forces may be found as 



dU 



dU 



dq, dQ j 



(2.23) 



Applying this equation to Eq. (21), we see that, for example 



PnQi + 



Pu +P21 



<2 2 



(2.24) 



Now we may argue that dynamics of charge Qj should only depend on the electrostatic potential this 
charge "sees". This means, in particular, that </)\ should be a unique function of f\. Comparing Eq. (24) 
with the first of Eqs. (19), we see that for this to be true, Eq. (22) should indeed be valid. 

Equations (19) and (21) show that for the general case of arbitrary charges Q\ and Q2, the system 
properties cannot be reduced to just one coefficient ("capacitance"). Let us consider three particular 
cases when such a reduction is possible. 

(i) The system as the whole is electrically neutral: Q\ = -Q 2 = Q. In this case the most important 
function of Q is the difference of conductor potentials, called voltage: 14 



Voltage 



Mutual 
capacitance 



2 ' 



For that function, the subtraction of two Eqs. (19) gives 



V = 



Q_ 



with C - 



(Pu +Pnh(Pi2 +P21Y 



(2.25) 



(2.26) 



where coefficient C m is called the mutual capacitance between the conductors - or, again, just 
"capacitance". The same coefficient describes the electrostatic energy of the system. Indeed, plugging 
Eq. (25) into Eq. (21), we see that both forms of Eq. (15) are reproduced if <f> is replaced with V, Q\ with 
Q, and C with C m : 



13 See, e.g., CM Chapter 2. 

14 A word of caution: in condensed matter physics, voltage is usually defined differently, as the difference of 
electrochemical rather than electrostatic potentials - see, e.g., SM Sec. 6.4. These two definitions coincide if the 
conductors have equal workfunctions (for example, if they are made of the same material), and in this course their 
difference will be ignored. 
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Capacitor's 
energy 



The best known system for which the mutual capacitance C m may be readily calculated is the 
plane capacitor, a system of two conductors separated with a narrow, plane gap (Fig. 2). Indeed, one 
may argue that since the surface charges, that contribute to the opposite charges ±Q of the conductors in 
this system, attract each other, in the limit d« a they sit entirely on the sides of the narrow gap. 



jj= Q = £j?Ly 2 
2C„, 2 




Let us apply the Gauss law to a pillbox volume (shown by dashed line in Fig. 2) whose area is a 
small part of the gap (but nevertheless much larger than d ), with one of the plane lids inside a 
conductor, and another one inside the gap. The result immediately shows that the electric field within 
the gap is E = a/so, i.e. is independent of the pillbox thickness. Integrating this field across thickness d 
of the gap, we get V= Ed= od/so, so that <j= s 0 V/d. But this voltage should not depend on the selection 
of the point of the gap area. As a result, <j should be also constant over all the gap area A, and hence Q = 
oA = SoV/d. Thus we may write V= QIC m , with 



d 



C m of 



(2.28) P |anar 

capacitor 



Let me offer a few comments on this well-known formula. First, it is valid even if the gap is not 
quite planar, for example if it gently curves on a scale much larger than d. Second, Eq. (28) is only valid 

2 2 

if A ~ a is much larger than d , because its derivation ignores the electric field deviations from 
uniformity 15 at distances ~d near the gap edges. Finally, the same condition (A » d ) assures that C m is 
much larger than the self-capacitance of each of the conductors - see Eq. (18). The opportunities given 
by this fact for electronic engineering and experimental physics practice are rather astonishing. For 
example, a very realistic 3-nm layer of high-quality aluminum oxide (which may provide a nearly 
perfect electric insulation between two thin conducting films) with area of 0.1 m 2 (which is a typical 
area of silicon wafers used in semiconductor industry) provides C m ~ 1 mF, 16 larger than the self- 
capacitance of the whole planet Earth! 

In the case shown in Fig. 2, the electrostatic coupling of the two conductors is evidently strong. 
As an opposite example of a weakly coupled system, let us consider two conducting spheres of the same 
radius R, separated by a much larger distance d (Fig. 3). 



15 Frequently referred to "fringe" fields resulting in an additional "stray" capacitance C m ' ~ st>a. 

16 Just as in Sec. 1, in order for the estimate to be realistic, I took into account the additional factor s ,• (for 
aluminum oxide, close to 10) which should be included into the nominator of Eq. (28) to make it applicable to 
dielectrics - see Chapter 3 below. 
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R 



d» R 




Fig. 2.3. A system of two well separated, 
similar conducting spheres. 



In this case the diagonal components of matrix pjy may be approximately found from Eq. (16), 
i.e. by neglecting the coupling altogether: 

Ai =^22 ~-r-^- (2-29) 
Ans 0 R 

Now, if we had just one sphere (say, number 1), the electric potential at distance d from its center would 
be given by Eq. (16): <j> = Qxlbnsqd. Now if we move into this point a small (R « d) sphere without its 
own charge, we may expect that its potential should not be too far from this result, so that <fe « QxlAnsyd. 
Comparing this expression with Eq. (19) (taken for Q 2 = 0), we get 

Pu =Pu ~ , 1 , «Pn,Pi2- (2-30) 
4ff£ 0 d 

From here and Eq. (26), the mutual capacitance 

C ffl « l - *2*e 0 R. (2.31) 

P11+P22 

We see that (somewhat counter-intuitively), in this case C m does not depend substantially on the 
distance between the spheres, i.e. does not describe their electrostatic coupling. The off-diagonal 
coefficients of the reciprocal capacitance matrix (20) play this role much better - see Eq. (30). 

(ii) Now let us consider the case when only one conductor of the two is charged, for example Q\ 
= Q, while Q 2 = 0. Then Eqs. (19) yield 

<t>\ =PuQv (2-32) 

Now, if we follow Eq. (13) and define Q = \lpjj as the partial capacitance of conductor number j, we 
see that it differs from the mutual capacitance C m - cf. Eq. (26). For example, in the case shown in Fig. 
3, Ci = C 2 « 4xsoR « 2C m . 

(iii) Finally, let us consider a popular case when one of the conductors is charged by a certain 
charge (say, Q\ = Q), but the potential of another one is sustained constant, say $2 = 0. 17 (This condition 
is especially easy to implement if the second conductor is much larger that the first one. Indeed, as the 
estimate (18) shows, in this case it would take much larger charge Q 2 to make potential ^ comparable 
with </)\.) In this case the second of equations (19) yields Q 2 = - {pi\lpii)Q\. Plugging this relation into 
the first of those equations, we get 



17 In electrical engineering, such constant-potential conductor is called the ground. This term stems from the fact 
that in many cases the Earth surface may be considered a good electric ground, because its potential is unaffected 
by that of laboratory-scale static charges. 
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r \ 
n P12P21 

P22 J 



a 



(2.33) 



Thus, if we treat the reciprocal of the expression in parentheses, 



Cf 



PnPix 



'22 J 



(2.34) 



as the effective capacitance of the first conductor, it is generally different both from C m and (unless the 
conductors are far apart and their electrostatic coupling is negligible) from C\ = \lp\\. 

To summarize this section, the potential (and hence the actual capacitance) of a conductor in a 
two-conductor system may be very much dependent on what exactly is being done with the second 
conductor when the first one is charged. This is also true for multi-conductor systems (for whose 
description, Eqs. (19) and (21) may be readily generalized); moreover, in that case even the mutual 
capacitance between two selected conductors may depend on the electrostatics conditions of other 
components of the system. 



2.3. The simplest boundary problems 

In the general case when the electric field distribution in the free space between the conductors 
cannot be readily found from the Gauss law or by any other special methods, the best approach is to try 
to solve the differential Laplace equation (1.42), with boundary conditions (lb): 



vV = o, $ 



k ' 



(2.35) 



Typical 

boundary 

problem 



where St is the surface of the A>th conductor of the system. After such boundary problem has been 
solved, i.e. the spatial distribution <f(r) has been found in all points outside the conductor, it is 
straightforward to use Eq. (3) to find the surface charge density, and finally the total charge 



Qk = f ad' 



(2.36) 



of each conductor, and hence any component of the reciprocal capacitance matrix py. As an illustration, 
let us implement this program for three very simple problems. 

(i) Plane capacitor (Fig. 2). In this case, the easiest way to solve the Laplace equation is to use 
linear (Cartesian) coordinates with one coordinate axis, say z, normal to the conductor surfaces (Fig. 4). 




Fig. 2.4. Plane capacitor's geometry used for the 
solution of the boundary problem (35). 
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In these coordinates, the Laplace operator is just the sum of three second derivatives. 18 It is 
evident that due to problem's translational symmetry in the {x, y} plane, deep inside the gap (i.e. at the 
lateral distance from the edges much larger than d) the electrostatic potential may only depend on the 
coordinate perpendicular to the gap surfaces: $r) = $z). For such a function, derivatives over x and y 
vanish, and the boundary problem (35) is reduced to a very simple ordinary differential equation 

4^) = °' ( 2 - 37 > 
dz 

with boundary conditions 

^(0) = 0, </){d) = V. (2.38) 

(For the sake of notation simplicity, I have used the discretion of adding a constant to the potential to 
make one of the potentials vanish, and also definition (25) of voltage V.) The general solution of Eq. 
(37) is a linear function: tf> (z) = c\z + cj, whose constant coefficients ci,2 may be found, in an elementary 
way, from the boundary conditions (38). The final solution is 

<t> = V- (2.39) 
a 

From here the only nonvanishing component of the electric field is 

(2.40) 

dz d 

and the surface charge of the capacitor plates 

cr = e 0 E n =+£ 0 E z =±£ 0 ^-, (2.41) 

d 

where the upper and lower sign correspond to the upper and lower plate, respectively. Since cr does not 
depend on coordinates x and y, we can get the full charges Q\ = - Q2 = Q of the surfaces by its 
multiplication by the gap area A, giving us the again already known result (26) for the mutual 
capacitance C m = QIV. I believe that this calculation, though very easy, may serve as a good introduction 
to the boundary problem solution philosophy. 

(ii) Coaxial-cable capacitor. Coaxial cable is a system of two round cylindrical, coaxial 
conductors, with the cross-section shown in Fig. 5. 




Fig. 2.5. Cross-section of a coaxial capacitor. 



18 See, e.g. MAEq. (9.1). 
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Evidently, in this case the cylindrical coordinates {p, q>, z}, with axis z along the common axis of 
the cylinders, are most appropriate. Due to the axial symmetry of the problem, in these coordinates E(r) 
= npE(p), (jj{r) = (p{p), so that in the general expression for the Laplace operator 19 we can take dldcp = 
d/dz = 0. As a result, only the first (radial) term of the operator survives, and the boundary problem (35) 
takes the form 



]_d_ 
p dp 



dp 



= 0, 0(a) = V, <P(b) = 0. 



(2.42) 



The sequential integration of this ordinary differential equation is elementary (and similar to that of the 
Poisson equation in spherical coordinates, performed in Sec. 1.3), giving 



f^Lfi, 0( p ) =c {^ + C2 =Ci \ n £. + ( 
dp p { p' a 

Constants may be found using boundary conditions (42): 



(2.43) 



V = c 2 , 0 = C[ In — hc 2 , 
a 

giving c\ = - Vl\n(bld), so that solution (43) takes the following form 



Ap) = v 



1- 



ln(p / a) 
\n{bla) 



(2.44) 



(2.45) 



Next, for our axial symmetry the general expression for the gradient 20 is reduced to the radial derivative, 
so that 



dp p\mb I a) 



(2.46) 



This expression, plugged into Eq. (2), allows us to find the density of conductors' surface charge. For 
example, for the inner electrode 



e 0 V 



a\n(bl a) 



so that its full charge (per unit length of the system) is 



Q 2tts 0 V 

— = 2m<j n = — 

L \n(b/a) 



(2.47) 



(2.48) 



(It is straightforward to check that the charge of the outer electrode is equal and opposite.) Hence, by 
the definition of the mutual capacitance, its value per unit length is 



Q 



2ns,, 



L LV \n{bla) 



(2.49) 



19 See, e.g., MA Eq. (10.3). 

20 See, e.g.,MAEq. (10.2). 
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This expression shows that the total capacitance C is proportional to the systems length L (if L 
» a,b), while being only logarithmically dependent on is the dimensions of its cross-section. Since log 
of a very large argument is an extremely slow function (sometimes called "quasi-constant"), if the 
external conductor is made large (b » a) the capacitance diverges, but very weakly. Such a logarithmic 
divergence may be cut by any miniscule additional effect, for example by the finite length L of the 
system. This allows one to get a crude but very useful estimate of self-capacitance of a single wire: 



^ 2tts 0 L 



ML I a) 



for L» a 



(2.50) 



On the other hand, if the gap between the conductors is narrow: b = a + d, with d « a, then \n(b/a) = 
ln(l + did) may be approximated as dla, and Eq. (49) is reduced to C m « Ins^aLId, i.e. to Eq. (28) for the 
plane capacitor, with^4 = 2mL. 

(iii) Spherical capacitor. This is a system of two conductors, with the same central cross-section 
as the coaxial cable (Fig. 5), but now with the spherical rather than axial symmetry. This symmetry 
implies that we are better off using spherical coordinates, so that potential <fi depends only on one of 
them, the distance r from the common center of the conductors: <fi(r) = <fi(r). As we already know from 
Sec. 1.3, in this case the general expression for the Laplace operator is reduced to its first (radial) term, 
so that the Laplace equation takes a simple form - see Eq. (1.47). Moreover, we have already found the 
general solution to this equation - see Eq. (1.50): 



<t>{r) = — + c 2 , 
r 

Now acting exactly as above, i.e. determining constant c\ from the boundary conditions 0(a) 
0, we get 



V = c, 



{a 



b 



so that tj)(r) = 



V 



+ c. 



(2.51) 
V, <&b) - 

(2.52) 



Next, we can use the spherical symmetry to find electric field, E(r) = n r E(r), with 



E(r) = - 



d<f) 
dr 



V_ 

2 



(2.53) 



and hence its values on conductors' surfaces, and then the surface charge density <j from Eq. (2). For 
example, for the inner conductor's surface, 



a a =s 0 E(a) = £ 0 — 
a 



b 



(2.54) 



so that, finally, for the full charge of that conductor we get 



Q = 4m a = 47T£ C 



1 



b 



(2.55) 



(Again, the charge of the outer conductor is equal and opposite.) Now we can use the definition of the 
mutual capacitance to get the final result 
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C = — = Atif 



l l 



= 4^ 0 . (2.56) 



For Z> »a, this result coincides with Eq. (17) for self-capacitance of the inner conductor. On the 
other hand, if the gap between two conductors is narrow, d = b - a « a, 

. a(a + d) . a 2 __. 
C m = 4^ 0 - * 4^ 0 — , (2.57) 
a a 

i.e. the capacitance approaches that of the planar capacitor of area A = Am 2 - as it should. 

All this seems (and is) very straightforward, but let us contemplate what was the reason for such 
easy successes. We have managed to find such coordinate transformations, for example {x, y, z) — > {r, 
6, (p) in the spherical case, that both the Laplace equation and the boundary conditions involve only one 
of the new coordinates (in this case, r). The necessary condition for the former fact is that the new 
coordinates (in this case, spherical ones) are orthogonal. This means that three vector components of 
differential dr, due to small variations of the new coordinates (say, dr, d6, and dcp), are mutually 
perpendicular. If this were not so, the Laplace operator would not fall into the simple sum of three 
independent parts, and could not be reduced, at the proper symmetry of the problem, to just one of these 
components, making it readily integrable. 



2.4. Orthogonal coordinates 

This methodology may be further extended to other systems of orthogonal coordinates. As an 
example, let us have a look at the following problem: finding the self-capacitance of a thin, round 
conducting disk (and, as solution's by-products, the distributions of the electric field and surface charge) 
- see Fig. 6. The cylindrical or spherical coordinates would not give too much help here, because though 
they have the appropriate axial symmetry about axis z, they would make the boundary condition on the 
disk too complex (two coordinates, either p and z, or r and 6). 




Fig. 2.6. The thin conducting disk problem. (The cross- 
section of the system by the vertical plane y = 0.) 



The relief comes from noting that the disk, i.e. the area z = 0, r < R, may be thought of as the 
limiting case of an axially-symmetric ellipsoid - the result of rotation of the usual ellipse about one of its 
axes - in our case, the vertical axis z. 21 Analytically, such an ellipsoid may be described by the following 
equation: 



Alternative names for such an ellipsoid are the "ellipsoid of rotation" and "spheroid". 
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2 , 2 

x + y 



z 

b 1 



(2.58) 



where a and b are the so-called major semi-axes whose ratio determines the ellipse eccentricity (the 
degree of squeezing). For our problem, we will only need oblate ellipsoids with a > b; according to Eq. 
(58), they may be presented as surfaces of constant a in the system of degenerate ellipsoidal 
coordinates {a, p, <p}, which are related to the Cartesian coordinates as follows: 

x = R cosh a sin /? cos q>, 

y = R cosh a sin /? sin q>, (2.59) 
z = R sinh a cos /?. 

Such ellipsoidal coordinates are the evident generalization of the spherical coordinates, which 
correspond to the limit a » 1 (i.e. r » R). In the opposite limit of small a, the surface of constant a = 
0 describes our thin disk of radius R. It is almost evident (and easy to prove) that coordinates (59) are 
also orthogonal, so that the Laplace operator may be expressed as a sum of three independent terms: 



V 2 =- 



1 



i? 2 (cosh 2 «-sin 2 ^) 



1 



cosh a da 
1 8 



8 ( , 8 ^ 
cosh« — 
da 



+ ■ 



sin /? d(3 



■ R 8 

sin B — 

dp 



J 

\ ■ 
+ 



1 



1 



sin p cos a 



d^ 



(2.60) 



Though this expression may look a bit intimidating, let us notice that in our current problem, the 
boundary conditions depend only on coordinate a: 12 



0\ a -_ o =V, J =0 



(2.61) 



Hence there is every reason to believe that the electrostatic potential in all space is the function of a 
alone. (In other words, all ellipsoids a = const are the equipotential surfaces.) Indeed, acting on such 
function 0(a) by the Laplace operator (60), we see that the two last terms in the square brackets vanish, 
and the Laplace equation (35) is reduced to a simple ordinary differential equation 



_d_ 
da 



cosh« 



da 



= 0. 



Integrating it twice, just as we did in the previous problems, we get 

da 



<j>(a) = c x \ 



cosh a 



(2.62) 



(2.63) 



This integral may be readily taken, for example, using the substitution % = sinha (with dE, = cosha da, 



2 2 2 

cosh a= 1 + sinh a= 1 + £ ): 



sinh or 



<p(a) = C[ f ^ +c 2 = Cj arctan(sinh«) + c 2 



(2.64) 



22 I have called disk's potential V, to distinguish it from the potential <f> at an arbitrary point of space. 
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The integration constants Ci ; 2 are again simply found from boundary conditions, in this case Eqs. (61), 
and we arrive at the final expression for the electrostatic potential: 



<j>(a) = V 



2 

1 arctan(sinha) 

n 



(2.65) 



This solution satisfies both the Laplace equation and the boundary conditions. Mathematicians tell us 
that the solution of any boundary problem of the type (35) is unique, so we do not need to look any 
further. 

Now we may use Eq. (2) to find the surface density of electric charge, but in the case of thin 
disk, it is more natural to add up such densities on its top and bottom surfaces at the same distance r = 

2 2 1/2 

(x + y ) from the disk center (which are evidently equal, due to the problem symmetry about plane z = 
0): cr= 2£a£'«|z=+o. According to Eq. (65), the electric field on the surface is 

= _dl> 80(a) ■ _2 1 _2 1 

*\ a =* dz \z=+o a(i?sinhacos/?) U = +0 7i Rcos/3 n (i? 2 -r 2 ) 1/2 ' ^' J 

and we see that the charge is distributed along the disk very nonuniformly: 

°=h v w^r- <2 67) 

with a singularity at the disk edge. Below we will see that such singularities are very typical for sharp 
edges of conductors. 23 Fortunately, in our current case the divergence is integrable, giving a finite disk 
charge: 

Q= \od 2 r = \<j(r)2nrdr = -e 0 VlJ rdr = 4e 0 VR\-^= = Se 0 RV. (2.68) 

surface 

Thus, for disk's self-capacitance we get a very simple result, 

C = S£ 0 R = -4tt£ 0 R, (2.69) 
n 

a factor of 21 n « 0.64 lower than that for the conducting sphere of the same equal radius, but still 
complying with the general estimate (18). 

Can we always find a "good" system of orthogonal coordinates? Unfortunately, the answer is no, 
even for highly symmetric geometries. This is why the practical value of this approach is limited, and 
other methods of boundary problems are clearly needed. Before moving to them, however, let us note 
that in the case of 2D problems (i.e. cylindrical geometries), the orthogonal coordinate method gets help 
from the following conformal mapping approach. 

Let us consider the pair of Cartesian coordinates {x, y} of the cross-section plane as a complex 
variable ■z = x + iy, 24 where i is the imaginary unity (z = -1), and let a^z) = u + iv be an analytic complex 



23 If you seriously worry about the formal infinity of charge density at r — > R, please remember that this 
mathematical artifact disappears for any nonvanishing disk thickness. 

24 The complex variable a. should not be confused with the (real) 3 rd spatial coordinate z\ We are considering 2D 
problems now, with the potential independent of z. 



Chapter 2 



Page 16 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



function of >z 25 For our current purposes, the most important property of an analytic function is that its 
real and imaginary parts obey the following Cauchy-Riemann relations: 26 



For example, for the function 
whose real and imaginary parts are 



du = ^ dv = _du (270) 

dx dy dx dy 



ut= <z = (x + iyf = (x 2 -y 2 ) + 2ixy, (2.71) 



u = Reut = x 2 - y 2 , v = lmw- = 2xy, (2.72) 

we immediately see that du/dx = 2x = dv/dy, and dv/dx = 2y = -du/dy, in accordance with Eq. (70). 

Let us differentiate the first of Eqs. (70) over x again, then change the order of differentiation, 
and after that use the latter of those equations: 

d 2 u _ d du _ d dv _ d dv _ d du _ d 2 u ^ j^) 

dx 2 dx dx dx dy dy dx dy dy dy 2 ' 

and similarly for v. This means that the sum of second-order partial derivatives of each of real functions 
u(x,y) and v(x,y) is zero, i.e. that both functions obey the 2D Laplace equation. This mathematical fact 
opens a nice way of solving problems of electrostatics for (relatively simple) 2D geometries. Imagine 
that for a particular boundary problem we have found a function a{z) for which either u{x, y) or v(x, y) 
is constant on all electrode surfaces. Then all lines of constant u (or v) present equipotential surfaces, i.e. 
the problem of the potential distribution has been essentially solved. 

As a simple example, consider a practically important problem: the quadrupole electrostatic 
lens- a system of four cylindrical 27 electrodes with hyperbolic cross-sections, whose boundaries obey the 
following relations: 

2 2 _ | + a 2 , for the left and right electrodes, ^ 
[ - a 2 , for the top and bottom electrodes, 

voltage-biased as shown in Fig. 7a. Comparing these relations with Eqs. (72), we see that each electrode 

2 2 

surface corresponds to a constant value of u = ±a . Moreover, potentials of both surfaces with u = +a 
are equal to +V/2, while those with u = -a are equal to -VI2. Hence we may conjecture that the 
electrostatic potential at each point is a function of u alone; moreover, a simple linear function, 

<f> = c x u +c 2 = Cj(x 2 -y 2 ) + c 2 , (2.75) 



25 The analytic (or "holomorphic") function may be defined as the one that may be expanded into the complex 
Taylor series, i.e. is infinitely differentiable in the given point. (Almost all "regular" functions, such as ■z 1 '", 
exp ^, In ■z, etc. and their combinations are analytic at all i, maybe besides certain special points.) If the reader 
needs to brush up his or her background on this subject, I can recommend a popular (and very inexpensive :-) 
textbook by M. Spiegel et al, Complex Variables, 2 nd ed., McGraw-Hill, 2009. 

26 These relations may be, in particular, to prove the famous Cauchy integral formula - see, e.g., MA Eq. (15.1). 

27 Let me remind the reader that in mathematics, term cylindrical describes a surface formed by translation, along 
a straight line, of an arbitrary curve, and hence more general than the usual circular cylinder. (In this terminology, 
for example, a prism is also a particular form of cylinder, formed by translating a polygon.) 
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is a valid (and hence the unique) solution of our boundary problem. Indeed, it does satisfy the Laplace 
equation, while its constants ci,2 may be selected in a way to satisfy all the boundary conditions shown 
in Fig. 7a: 



Vx 2 -y 2 
2 a- 



</> = - (2.76) 



so that the boundary problem has been solved. 




Fig. 2.7. (a) Quadrupole electrostatic lens geometry and (b) its analysis using conformal mapping. 



According to Eq. (76), all equipotential surfaces are hyperbolic cylinders, similar to those of the 
electrode surfaces. What remains is to find the electric field at an arbitrary point inside the system: 

E.-f-V^ E t -f = V^ ("7) 
ox a by a 

These formulas show that if charged particles (e.g., electrons in an electron optics system) are launched 
to fly ballistically through the lens, along axis z, they experience a force pushing them toward the 
symmetry axis and proportional to particle's deviation from the axis (and thus equivalent in action to an 
optical lens with positive refraction power) in one direction, and a force pushing them out (negative 
refractive power) in the perpendicular direction. One can show that letting charged particles fly through 
several such lenses, with alternating voltage polarities, in series, enables beam focusing. 28 

Hence, we have reduced the 2D Laplace boundary problem to that of finding the proper analytic 
function w{a). This task may be also understood as that of finding a conformal map, i.e. a 
correspondence between components of any point pair, {x, y) and {u, v}, residing, respectively, on the 
initial Cartesian plane <& and the plane ue of the new variables. For example, Eq. (74) maps the real 
electrode configuration onto the plane capacitor with infinite area (Fig. 7b), and the simplicity of Eq. 
(75) is due to the fact that for the latter system the equipotential surfaces are just parallel planes. 

For more complex geometries, the suitable analytic function m{a) may be hard to find. However, 
for conductors with piece-linear cross-section boundaries, substantial help may be obtained from the 
following Schwarz-Christoffel integral 



28 See, e.g., textbook by P. Grivet, Electron Optics, 2 n ed., Pergamon, 1972, or the review collection A. Septier 
(ed.), Focusing Charged Particles, vol. I, Academic Press, 1967, in particular the review by K.-J. Hanszen and R. 
Lauer, pp. 251-307. 
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»W = const x f ^ . (2.78) 

that provides the conformal mapping of the interior of an arbitrary iV-sided polygon on plane ut = u + iv, 
and the upper-half (y > 0) of plane * = jc + iy. Here x, (/ = 1,2, N - 1) are the points of axis y = 0 (i.e., of 
the boundary of the mapped region on plane <£) to which the corresponding polygon vertices are mapped, 
while kj are the exterior angles at the polygon vertices, measured in the units of n , with -1 < kj < +1 - 
see Fig. 8. 29 Of points Xj, two may be selected arbitrarily (because their effects may be compensated by 
the multiplicative constant in Eq. (78), and the constant of integration), while all the others have to be 
adjusted to provide the correct mapping. 




In the general case, the complex integral (78) may be hard to tackle. However, in some important 
cases, in particular those with right angles (kj= +V2) and/or with some points at, at infinity, the integrals 
may be readily worked out, giving explicit analytical expressions for the mapping functions wi/z). For 
example, let us consider a semi-infinite strip, defined by restrictions -1 < u < +1 and 0 < v, on plane ut - 
see Fig. 9a. 




Fig. 2.9. Semi-infinite 
strip mapped onto the 
upper half-plane. 



29 The fact that integral (70) includes only (N - 1 ) rather than TV poles stems from the fact that polygon' s geometry 
is completely determined by (N - 1) positions ut, of its vertices and (N - 1) angles nkj. In particular, since the 
algebraic sum of all external angles of a polygon equals n, the last angle parameter kj = k N is uniquely determined 
by the set of the previous ones. 
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The strip may be considered as a polygon, with one vertex at the infinitely distant vertical point 
ut 3 = 0 + z'co. Let us map it on the upper half of plane with vertex ut\ = -1 + z'O mapped onto point x\ = 
-1, and vertex ut 2 = +1 + z'O mapped onto point xi = +\. Since both external angles in this case are equal 
to +7rl2, and hence k\ = ki = + l A, Eq. (78) is reduced to 

r \ f dz, r dt . r d/z, ,„ 

«t(*) = const x = const x 2 = const x z — - . (2.79) 

J (« + 1) yt - 1) J yt - 1) J (1 - <z ) 

This complex integral may be taken, just as for real by the substitution i = sin£ giving 



= const' x jd£ = c t arcsin -z + c 2 



(2.80) 



Determining constants from the required mapping, i.e. from the equations + z'O) = -1 + z'O and 
«<+!+ z'0)= +1+ z'O (see Fig. 9), we finally get 

, . 2 . . TTUt _ 0 . 

w-yi) = — arcsin*;, i.e. *=sin . (2.81a) 

n 2 

Using the well-known expression for the sine of a complex argument, 30 we may rewrite this elegant 
result in either of the two following forms for the real and imaginary components of i and m\ 

2 2x 2 . \(x + \) 2 +y 2 ]' 2 +b-l) 2 + y 2 ]' 2 
u — — arcsin T =— — f =— , v = — arccosh ^ ^ ^ — , 

[(x + l) 2 + /] 1/2 + [(x-l) 2 + /] 1/2 * 2 

x = sin — cosh — , v = cos — sinh — . (2.81b) 
2 2 2 2 

It is amazing how perfectly does the last formula manage to keep y = 0 at different borders of our ut- 
region (Fig. 9): at its side borders (u = ±1, 0 < v < oo), this is performed by the first multiplier, while at 
the bottom border (-1 < u < +1, v = 0), the equality is insured by the second operand. 

This mapping may be used to solve several electrostatics problems with the geometry shown in 
Fig. 9; probably the most surprising of them is the following one. A straight gap of width 2t is cut in a 
thin conducting plane, and voltage V is applied between the resulting half-planes - see the bold lines in 
Fig. 10. Selecting a Cartesian coordinate system with axis z along the cut, axis y perpendicular to the 
plane, and the origin in the middle of the cut, we can write the boundary conditions of this Laplace 
problem as 

[ + VI2, atx>?,v = 0, 
<p = \ y (2.82) 

[-V/2, atx<-t,y = 0. 

(Due to problem's symmetry, we may expect that in the middle of the gap, i.e. at -t < x < +t and y = 0, 
the electric field is parallel to the plane and hence dtp/dy = 0.) The comparison of Figs. 9 and 10 shows 
that if we normalize our coordinates to t, Eq. (81) provides the conformal mapping of our system on 
plane i to the field in a plane capacitor on plane tw, with voltage Fbetween two planes u = ±1. Since we 



30 See, e.g.,MAEq. (3.4). 
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already know that in that case 0 = (V/2)u, we may immediately use the first of Eqs. (81b) to write the 
final solution of the problem (in the dimensional coordinates): 31 




(2.83) 



Thin lines in Fig. 10 show the corresponding equipotential surfaces; 32 it is evident that the 
electric field concentrates at the gap edges, just as it did at the edge of the thin disk (Fig. 6). Let me 
leave the remaining calculation of the surface charge distribution and the mutual capacitance between 
the half-planes (per unit length) for reader's exercise. 

2.5. Variable separation 

The general approach of the methods discussed in the last two sections was to satisfy the Laplace 
equation by a function of a single variable that also satisfies the boundary conditions. Unfortunately, in 
many cases this cannot be done (at least, using practicably simple functions). In this case, a very 
powerful method, called variable separation, may work, frequently producing "semi-analytical" results 
in the form of an infinite series of either elementary or well-studied special functions. The main idea of 
the method is to present the solution of the general boundary problem (35) as the sum of partial 
solutions, 

# = 5>*^> ( 2 - 84 > 

k 

where each function tpk satisfies the Laplace equation, and then select the set of coefficients Ck to satisfy 
the boundary conditions. More specifically, in the variable separation method the partial solutions <fa are 
looked for in the form of a product of functions, each depending of just one spatial coordinate. 



31 This result could also be obtained using the so-called elliptical (not ellipsoidal!) coordinates. 

32 Another graphical representation of the electric field distribution, by field lines, is much less convenient. As a 
reminder, the field lines are defined as lines to whom the (in our current case, electrostatic) field vectors are 
tangential at each point. By this definition, the field lines are always normal to the equipotential surfaces, so that it 
is always straightforward to sketch them from the equipotential surface pattern - such as shown in Fig. 10. 
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(i) Cartesian coordinates . Let us discuss this approach on the classical example of a rectangular 
box with conducting walls (Fig. 11), with the same potential (that I will take for zero) at all the walls, 
but a different potential V fixed at the top lid. Moreover, in order to demonstrate the power of the 
variable separation method, let us carry out all the calculations for a more general case when the top 
lead potential is an arbitrary 2D function V(x, _y). 33 



Z ' 








A 


0 = V(x,y) 


/ 




) 






b . 


0 = 0 


/ 



y 



Fig. 2.11. Standard playground for the variable 
separation method discussion: a rectangular box 
with five conducting, grounded walls and a fixed 
potential distribution V(x, y) on the top lid. 



For this geometry, it is natural to use Cartesian coordinates {x, y, z) and hence present each of 
the partial solutions in Eq. (84) as a product 



0 k =X(x)Y(y)Z(z). 



Plugging it into the Laplace equation expressed in the Cartesian coordinates, 

= 0, 



d 2 A , d2 A , dV* 



■ + ■ 



dx dy 

and dividing the result by product XYZ, we get 



■ + ■ 



dz^ 



(2.85) 



(2.86) 



1 d 2 X 1 d 2 Y 1 d 2 Z _ 
■ + - + = 0 , 



X dx 2 Y dy 2 Z dz' 



(2.87) 



Here comes the punch line of the variable separation method: since the first term of this sum may 
depend only on x, the second one only of y, etc., Eq. (87) may be satisfied everywhere in the volume 
only if each of these terms equals a constant. In a minute we will see that for our current problem (Fig. 
11), these constant x- and j-terms have to be negative; hence let us denote these variable separation 
constants as (-a 2 ) and (-/?), respectively. Now Eq. (87) shows that the constant z-term has to be 
positive; if we denote it as we get the following relation: 



a 2 +/3 2 =y 2 



(2.88) 



Now the variables are separated in the sense that for functions X(x), Y(y), and Z(z) we have got 
separate ordinary differential equations, 



33 Such distributions may be implemented in practice using so-called mosaic electrodes consisting of many 
electrically-insulated and individually-biased panels. 



Chapter 2 



Page 22 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



^ + a 2 X = 0, d ^ + j3 2 Y = 0, ^- r 2 Z = 0, (2.89) 
ax ay dz 

which are related only by Eq. (88) for their parameters. Let us start from the equation for function X(x). 
Its general solution is the sum of functions sinca and cosca, multiplied by arbitrary coefficients. Let us 
select these coefficients to satisfy our boundary conditions. First, since <f><x.X should vanish at the back 
vertical wall of the box (i.e., with the choice of coordinate origin shown in Fig. 1 1, at x = 0 for any y and 
z), the coefficient at cosca should be zero. The remaining coefficient (at sinca) may be included into the 
general factor cu in Eq. (84), so that we may take Xin the form 

X = sinca. (2.90) 

This solution satisfies the boundary condition at the opposite wall (x = a) only if its argument aa is a 
multiple of n , i.e. if a is equal to any of the following numbers (commonly called eigenvalues): 

a n =—n, n = 1,2,... (2.91) 

a 

(Terms with negative values of n would not be linearly-independent from those with positive n, and may 
be dropped from the sum (84). Value n = 0 is formally possible, but would give X= 0, i.e. fa = 0, at any 
x, i.e. no contribution to sum (84), so it may be dropped as well.) Now we see that we indeed had to 
take a real, (i.e. a 2 positive); otherwise, instead of the oscillating function (90) we would have a sum of 
two exponential functions, which cannot equal zero in two independent points of axis x. 

Since the equation for function Y(y) is similar to that for X(x), and the boundary conditions on 
the walls perpendicular to axis y (y = 0 and y = b) are similar to those for x- walls, the absolutely similar 
reasoning gives 

Y = sm(3y, J3 m =^m, m = l,2,..., (2.92) 
b 

where the choice of integer m is independent of that of integer n. Now we see that according to Eq. (88), 
the separation constant ^depends on two indices, n and m, so that the relation may be rewritten as 

1/2 



r„ 



' m ) 



n 

\aj 



+ 



(2.93) 



The corresponding solution of the differential equation for Z may be presented as a sum of two 
exponents exp{±y nm z}, or alternatively as a linear combination of two hyperbolic functions, sinh/„ m z and 
coshy nm z, with arbitrary coefficients. At our choice of coordinate origin, the latter option is preferable, 
because coshy nm z cannot satisfy the zero boundary condition at the bottom lid of the box (z = 0). Hence 
we may take Z in the form 

Z = sinh r „ ffl z (2.94) 

that automatically satisfies that condition. 

Now it is the right time to combine Eqs. (84) and (85) for our case in a more explicit form, 
replacing symbol k for the set of two integer indices n and m: 



Chapter 2 



Page 23 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



0(x, y,z)= 2^ c nm sin sin — sinh y n 

n m=\ CI U 



(2.95) 



where y nm is given by Eq. (93). This solution satisfies our boundary conditions on all walls of the box, 
besides the top lid, for arbitrary coefficients c nm . The only job left for us is to choose these coefficients 
from the top-lid requirement: 

<p(x,y,c) = V{x,y) = 2^c nm sin sm— — sinh y nm c . (2.96) 

n,m=l CI ® 

It seems like a bad luck to have just one equation for the infinite set of coefficients c nm . However, the 
decisive help come from the fact that the functions of x and y that participate in Eq. (96), form full, 
orthogonal sets of ID functions. The last term means that the integrals of the products of the functions 
with different integer indices over the region of interest equal zero. Indeed, direct integration gives 

r . 7uix . m'x , fa/2, forn = rc', 

I sm sin ax = < (2.97) 

• a a [0, fortiori', 

and similarly for y (with evident replacements a — > b, n — > m). Hence, the fruitful way to proceed is to 
multiply both sides of Eq. (96) by the product of the basis functions, with arbitrary indices n ' and m ', 
and integrate the result over x and y: 

a b i i on a ib i 

I ax\ ay V(x, y)sm sin = 2^c nm sinhf )im c| sin sin axx I sin sin ay . (2.98) 

oo a b „ m=l 0 a a 0 b b 

Due to Eq. (97), all terms in the right-hand part of the last equation, besides those with n = n' and m = 
m ', vanish, and (replacing n ' with n, and m ' with m) we finally get 



c nm = , . : dxl dyV(x,y)sm sin—— . (2.99) 



ob sinh y nm c{ { ' a 

Relations (93), (95) and (99) present the complete solution of the posed boundary problem; we 
can see both good and bad news here. The first bit of bad news is that in the general case we still need to 
work out (formally, the infinite number of) integrals (99). In some cases, it is possible to do this 
analytically. For example, in our initial problem of constant potential on the top lid, V(x,y) = const = Vq, 
both ID integrations are elementary; for example 

r. nnx 2a f 1, for n odd, 

sin ax = — x<^ (2.100) 

• a Ttn 1 0, for n even, 

and similarly for the integral over y, so that 

1 6V 0 fl, if both n and m are odd, 



n* 'nm sinh y nm c lO, otherwise. 



(2.101) 



The second bad news is that even at such a happy occasion, we still have to sum up the infinite series 
(95), so that our result may only be called analytical with some reservations, because in most cases we 
need a computer to get the finial numbers or plots. 



Variable 
separation 
in Cartesian 
coordinates 
(example) 
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Now the first good news. Computers are very efficient for both operations (95) and (99), i.e. 
summation and integration. (As was discussed in Sec. 1.2, random errors are averaged out at these 
operations.) As an example, Fig. 12 shows the plots of the electrostatic potential in a cubic box (a = b = 
c), with an equipotential top lid (V= V 0 = const), obtained by numerical summation of series (95), using 
the analytical expression (101). The remarkable feature of this calculation is the very fast convergence 
of the series; for the middle cross-section of the cubic box (z/c = 0.5), already the first term (with n = m 
= 1) gives accuracy about 6%, while the sum of four leading terms (with n, m = 1, 3) reduces the error to 
just 0.2%. (For a longer box, c> a, b, the convergence is even faster - see the discussion below.) Only 
close to the corners between the top lid and the side walls, where the potential changes very rapidly, 
several more terms are necessary to get a reasonable accuracy. 




0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 

xla zlc 



Fig. 2.12. Distribution of the electrostatic potential within a cubic box (a = b = c) with constant voltage Vq on 
the top lid (Fig. 1 1), calculated numerically from Eqs. (93), (95) and (101). The dashed line on the left panel 
shows the contribution of the main term (with n = m = 1) to the full result. 

The second good news is that our "semi-analytical" result allow its ultimate limits to be explored 
analytically. For example, Eq. (93) shows that for a very flat box (c « a, b), y„ :tn z < y n>m c « 1 at least 
for the lowest terms of series (95), with n, m « c/a, c/b. In these terms, sinh functions in Eqs. (96) and 
(99) may be well approximated with their arguments, and their ratio by z/c. This means that if we limit 
the summation to these term, Eq. (95) gives a very simple result 

ftx,y)*-V(x,y) (2.102) 
c 

which means that each segment of the flat box behaves just as a plane capacitor. Only near the vertical 
walls (or near possible locations where V{x,y) is changed sharply), the higher terms in the series (95) are 
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important, producing deviations from Eq. (102). In the opposite limit (a, b « c), Eq. (93) shows that, in 
contrast, y„ im c « 1 for all n and m. Moreover, the ratio sinhf„ jm z/sinhf„ >m c drops sharply if either n or m 
is increased, if z is not too close to c. Hence in this case a very good approximation may be obtained by 
keeping just the leading term, with n = m = 1, in Eq. (95), so that the problem of summation disappears. 
(We saw above that this approximation works reasonably well even for a cubic box.) In particular, for 
the constant potential of the upper lid, we can use Eq. (101) and the exponential asymptotic for both sinh 
functions, to get a very simple formula: 



, 16 . TDC . 7W 

<p = — sin — sin — exp< 
n a b 



■ 71 



(a 2 +b 2 
ab 



-(c-z) 



(2.103) 



The same variable separation method may be used to solve more general problems as well. For 
example, if all walls of the box shown in Fig. 1 1 have an arbitrary potential distribution, one can use the 
linear superposition principle to argue that the electrostatic potential distribution inside the box as the 
sum of 6 partial solutions of the type of Eq. (95), each with one wall biased by the corresponding 
voltage, and all other grounded {</> = 0). 

To summarize, the results given by the variable separation method are closer to what we could 
call a genuinely analytical solution than to purely numerical solutions - see Sec. 6 below. Now, let us 
explore the issues that arise when this method is applied in other orthogonal coordinate systems. 

(ii) Polar coordinates . If a system of conductors is cylindrical, the potential distribution is 
independent of the coordinate z along the cylinder axis: difi/dz =0, and the Laplace equation becomes 
two-dimensional. If conductor's cross-section is rectangular, the variable separation method works best 
in Cartesian coordinates {x, y}, and is just a particular case of the 3D solution discussed above. 
However, if the cross-section is circular, much more compact results may be obtained by using polar 
coordinates {p, (p) . As we already know from the last section, these 2D coordinates are orthogonal, so 
that the two-dimensional Laplace operator is a simple sum. 34 Requiring, just as we have done above, 
each component of sum (84) to satisfy the Laplace equation, we get 



]_d_ 
P dp 



v 



dp) p 2 dcp 2 



+ 



k =0. 



In a full analogy with Eq. (75), let us present each particular solution as a product: ^ 
Plugging this expression into Eq. (104) and then dividing all its parts by Z?flp , we get 

f 1 j2. 



p d 
^ dp 



P 



d'K 



dp j f dcp 



(2.104) 



Zlp)A<P)- 



(2.105) 



Following the same reasoning as for the Cartesian coordinates, we get two separated ordinary 
differential equations 



dp 



d^ 

i 

dp , 



(2.106) 



34 See, e.g., MA Eq. (10.3) with d/dz = 0. 
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d 2 f 



2 +vY = 0, (2.107) 
dq> 

where y? is the variable separation constant. 

Let us start their analysis from Eq. (106), plugging into it a probe solution % = cp a , where c and 

a are some constants. Elementary differentiation shows that if a ^ 0, the equation is indeed satisfied for 
any c, with just one requirement on constant a, namely a 2 = v. This means that the following linear 
superposition 

K = a v p +V +b v p~ v , forv^O, (2.108) 

with constant coefficients a v and b v , is also a solution to Eq. (106). Moreover, the general theory of 
linear ordinary differential equations tells us that the solution of a second-order equation like Eq. (106) 
may only depend on just two constant factors that scale two linearly-independent functions. Hence, for 
all values v ^ 0, Eq. (108) presents the general solution of that equation. The case when v= 0, in which 
functions p +v and p ~ v are just constants and hence are not linearly-independent, is special, but in this 
case the integration of Eq. (106) is straightforward, 35 giving 

^ = a 0 + b 0 lnp, forv = 0. (2.109) 

In order to specify the separation constant, we should use Eq. (107), whose general solution is 

fc cos vcp + s„ sin vcp, for v ^ 0, 
f = \ v (2.110) 

[c 0 +s 0 <p, forv = 0. 

There are two possible cases here. In many boundary problems solvable in cylindrical coordinates, the 
free space region, in which the Laplace equation is valid, extends continuously around the origin point p 
= 0. In this region, the potential has to be continuous and uniquely defined, so that f has to be a 2n- 

periodic function of angle cp. For that, one needs vycp +2tz) to be equal to vcp + 2m, with n an integer, 
immediately giving us a discrete spectrum of possible values of the variable separation constant: 

v = n = 0,+l,+2,... (2.111) 

In this case both functions ^and ^may be labeled with the integer index n. Taking into account that the 
terms with negative values of n may be summed up with those with positive n, and that so should equal 
zero (otherwise the 2 ^--periodicity of function f would be violated), we see that the general solution to 
the 2D Laplace equation may be presented as 



Variable 
separation 
in polar 
coordinates 



b ^ 



0(P,<P) = a o +V n P + X a nP"+^v {c n cosn<p + s n smn<p). 

n=\ V P J 



(2.112) 



Let us see how all this machinery works on the classical problem of a round cylindrical 
conductor placed into an electric field that is uniform and perpendicular to cylinder's axis at large 
distances - see Fig. 13a. 36 First of all, let us explore the effect of system's symmetries on coefficients in 



35 Actually, we have already done it in Sec. 3 - see Eq. (43). 

36 This problem does belong to our current topic of electrostatic fields between conductors, because the uniform 
electric field may be created by a large plane capacitor. 
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Eq. (112). Selecting the coordinate system as shown in Fig. 13a, and taking the cylinder's potential for 
zero, we immediately have ao = 0. Moreover, due to the mirror symmetry about plane [x, z], the solution 
has to be an even function of angle (p, and hence all coefficients s„ should also equal zero. Also, at large 
distances (p » R) from the cylinder axis its effect on the electric field should vanish, and the potential 
should approach that of the uniform field E = EqH x : 



<j> — > -E Q x = -E 0 p cos (p, for p — y co 



(2.113) 



This is only possible if in Eq. (1 12), Z? 0 = 0, and also all coefficients a n with n ^ 1 vanish, while product 
a\C\ should be equal to (-E 0 ). Thus the solution is reduced to the following form 



oo Q 

<j){p, (p) = -E 0 p cos (p + —7 cos nq> . 

n=\ P 



(2.114) 



in which coefficients B n = b n c n should be found from the boundary condition on the cylinder's surface, 
i.e. at/? = R: 

#(R,<p) = 0. (2.115) 



(a) 



E r 





(b) 



Fig. 2.13. Conducting cylinder inserted into an initially uniform electric field perpendicular to is 
axis: (a) the problem's geometry, and (b) the equipotential surfaces given by Eq. (117). 



This requirement yields the following equation, 



B 



R 



L -E 0 R 



5„ 



cos^ + ^— n -zo%rup = 0, 

n=2 R" 



(2.116) 



that should be satisfied for all (p. But since functions cosncp are orthogonal, this equality is only possible 
if all B„ for n > 2 are equal zero, while B\ = EqR . Hence our final answer (which is of course only valid 
outside of the cylinder, i.e. for p > R), is 



</)(p,(p) = -E Q 



)2 A 



R 

P 

P J 



cos <p = -E 0 



1 



R 



2 A 



x +y 



x . 



(2.117) 
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This result (Fig. 13b) shows a smooth transition between the uniform field (113) far from the 
cylinder, to the equipotential surface of the cylinder (with 0 = 0). Such smoothening is very typical for 
Laplace equation solutions. Indeed, as we know from Chapter 1, these solutions corresponds to the 
lowest potential energy (1.67), and hence the lowest values of potential gradient modulus, possible at the 
given boundary conditions. 

To complete the problem, let us calculate the distribution of the surface charge density over the 
cylinder's cross-section, using Eq. (3): 



° S 0^n surface S Q 



dtj) 



d 



R 2 ^ 



dp P = R u u ' dp 



p ) 



R =s 0 E 0 cos<p— p p=R = 2s 0 E 0 cos<p. (2.118) 



This very simple formula shows that at the field direction shown in Fig. 13a (E 0 > 0), the surface charge 
is positive on the right side of the cylinder and negative on its left side, thus creating a field directed 
from the right to the left, that compensates the external field inside the conductor, where the net field is 
zero. Note also that the net electric charge of the cylinder is zero, in the correspondence with the 
problem symmetry. Another useful by-product of calculation (118) is that the surface electric field 
equals 2E 0 cos(p, and hence its largest magnitude is twice the field far from the cylinder. Such electric 
field concentration is very typical for all convex conducting surfaces. 

The last observation gets additional confirmation for the second possible topology, when Eq. 
(110) is used to describe problems with no angular periodicity. A typical example is a cylindrical 
conductor with a cross-section that features an angle limited by straight lines (Fig. 14). Indeed, at we 
may argue that at p< R (where R is the scale of radial extension of the straight sides of the corner), the 
Laplace equation may be satisfied by a sum of partial solutions ^(p)/{(p) if the angular components of 
the products satisfy the boundary conditions on the corner sides. Taking (just for the simplicity of 
notation) the conductor's potential to be zero, and one of the corner's sides as axis x (<p = 0), these 
boundary conditions are 

f(0) = f(JS) = 0, (2.119) 
where angle /?may be anywhere between 0 and 2n (Fig. 14). 



(a) (b) 




,' Fig. 2.14. Cylindrical conductor 
cross-sections with (a) a corner 
and (b) an edge. 



Comparing this condition with Eq. (110), we see that it requires c v to vanish, and v to take one 
of the values of the following discrete spectrum: 

v m ={nip)m, (2.120) 
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with positive integer m. Hence the full solution of the Laplace equation takes the form 



(2.121) 



where constants s y have been incorporated into a m . 

The set of constants a m cannot be simply determined, because it depends on the exact shape of 
the conductor outside the corner, and the externally applied electric field. However, whatever the set is, 
in the limit p -> 0, solution (121) is almost 37 always dominated by the term with lowest v 
(corresponding to m = 1) , 



a v nip ■ n 
<p^>a x p 'sin — ^ 



(2.122) 



because the higher terms go to zero faster. This potential distribution corresponds to the surface charge 
density 



<j — s 0 E n surface — e 0 



80 



(2.123) 



(It is similar on the opposite face of the angle.) 



Equation (123) shows that if we are dealing with a concave corner (/? < n, see Fig. 14a), the 
charge density (and the surface electric field) tends to zero. On the other case, at a convex edge (ft > n, 
see Fig. 14b), both charge and field concentrate, formally diverging at p — > 0. (So, do not sit on a roof 's 
ridge during a strong thunderstorm; hide in a ditch!) We already saw qualitatively similar effects at our 
analyses of the thin round disk and split plane in the past section. 

(iii) Cylindrical coordinates. Now, let us discuss whether it is possible to generalize our 
approach to problems whose geometry is still axially-symmetric, but with a substantial dependence of 
the potential on the axial coordinate (80/dz * 0). The classical example of such a problem is shown in 
Fig. 15. Here the side wall and the bottom lid of a round cylinder are kept at fixed potential (say, tj> = 0), 
but the potential V fixed at the top lid is different. This problem is qualitatively similar to the rectangular 
box problem solved above (Fig. 11), and we will also try to solve it for the case of arbitrary voltage 
distribution over the top lid: V= V(p, cp). 



z 
L 



<j> = V(p,<p) 



R 



y 



0 = 0 



Fig. 2.15. Round cylinder with conducting walls. 



37 Exceptions are possible only for highly symmetric configurations when the external field are crafted to make a\ 
= 0. In this case the solution is led by the first nonvanishing term of the series (121). 
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Following the main idea of the variable separation method, let us require that each partial 
function tpk in Eq. (84) satisfies the Laplace equation, now in full cylindrical coordinates {p, q>, z}: 38 



1JL 

P d P 



p 



M 

dp 



+ - 



P 2 dtp 2 



■ + ■ 



0. 



(2.124) 



Plugging in <p k in the form ^p)f{(p)Z{z) into Eq. (124) and dividing both parts by product ZtfZ, we get 



1 d 



p'K dp 



P 



d^_ 
dp 



+ ■ 



1 d 2 f | 1 d 2 Z _ Q 



p 2 f d(p 2 Z dz' 



(2.125) 



Since the first two terms of Eq. (125) can only depend on polar variables p and cp, while the third term, 
only on z, at least that term should be a constant. Denoting it (just like in the rectangular box problem) 
by y 2 , we get, instead of Eq. (125), a set of two equations: 

d 2 Z 



1 d 



p^ dp 



P 



dz' 

d^ 
dp j 



r*z s 



+ y + 



1 d 2 f 
p 1 ? dcp 1 



= 0. 



(2.126) 



(2.127) 



2 2 2 

Now, multiplying all the terms of Eq. (127) by p , we see that the last term, (d fld(p may depend 

only on <p, and thus should be constant. Calling that constant v 2 (as in Sec. (ii) above), we separate Eq. 
(127) into an angular equation, 



and a radial equation: 



dp p dp p 



(2.128) 



(2.129) 



Bessel 
equation 



We see that the ordinary differential equations for functions Z(z) and f{cp) (and hence their 
solutions) are identical to those discussed earlier in this section. However, Eq. (129) for the radial 
function ^p) (called the Bessel equation) is more complex than in the 2D case, and depends on two 
independent constant parameters, y and v. The latter challenge may be readily overcome if we notice 
that any change of y may be reduced to re-scaling the radial coordinate p. Indeed, introducing a 
dimensionless variable yp, 39 Eq. (129) may be reduced to an with one parameter, v. 



(2.130) 




38 See, e.g., MA Eq. (10.3). 

39 Please note that this normalization is specific for each value of the variable separation parameter y. Also, note 
that the normalization is meaningless for y = 0, i.e. for the case Z(z) = const. However, if we need partial 
solutions for this value of y, we can use Eqs. (108)-(109). 
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Moreover, we already know that for angle-periodic problems the spectrum of eigenvalues of Eq. (128) is 
discrete v-n. 

Unfortunately, even in this case, Eq. (130) cannot be satisfied by a single "elementary" function, 
and is the canonical form of an equation defining the Bessel function of the first kind, of order v, 
commonly denoted as JJ^E). Let me review in brief the Bessel function properties most relevant for the 
boundary problems of physics - and some other problems discussed in these lecture notes. 40 

First of all, the Bessel function of a negative integer order is very simply related to that with the 
positive order: 



(2.131) 



enabling us to limit our discussion to the functions with n > 0. Figure 16 shows four functions with a 
few lowest positive n. 



0.5 



"0.5 























n =y^o 








1 2 3 















10 



15 



:o 



Fig. 2.16. Several first-kind Bessel 
functions J„(^) of integer order. 
Dashed lines show the envelope of 
asymptotes (135). 



As argument x is increased, each function is initially close to a power law: Jo(£) ~ 1, Ji(£) ~ E)2 



$2, Ji(E) « £, /8, etc. This behavior follows from the Taylor series 



\2, 



z 



i-\y 



tk\(n + k)\ 



(2.132) 



which that is formally valid for any and may even serve as an alternative definition of function J„(<%). 
However, this series is converging fast only at relatively small arguments, £, < n, where its main term is 



40 For a more complete discussion of these functions, see the literature listed in MA Sec. 16, for example, Chapter 
6 (written by P. Davis) in the collection compiled and edited by Abramowitz and Stegun. 
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77 ! 



1/3 

At gx n + 1 .86/2 , the Bessel function reaches its maximum 41 

0.675 



max 



1/3 



(2.133) 



(2.134) 



and then starts to oscillate with a period that gradually approaches In, a phase shift that increases by kI2 

1/2 

with each unit increment of n, and an amplitude that decreases as g~ . These features are described by 
the following asymptotic formula 



cos(£ 



n 
~4 



nn 



), for g~ln — >oo, 



(2.135) 



that starts to give reasonable results very soon above the function peaks - see Fig. 16. 42 

Now we are ready to return to our case study (Fig. 15). Let us select functions Z(z) to satisfy the 
bottom-lid boundary condition Z(0) = 0, i.e. proportional to sinh^z - cf. Eq. (95). Then 



</> = X Z J n (rP)(c„ r cos n(p + s nr sin n^)sinh y z 



n=0 y 



Next, we need to satisfy the zero boundary condition at the cylinder's side wall (p 
ensured by taking 

J„(yR) = 0- 



(2.136) 
R). This may be 
(2.137) 



Since each function J„(x) has an infinite number of positive zeros (see Fig. 16), which may be numbered 
by an integer index m = 1, 2, . .., Eq. (137) may be satisfied with an infinite number of discrete values of 
the separation parameter y. 



r„ 



R 



(2.138) 



where £„ m is the m-th zero of function J n (x) - see the top numbers in the cells of Table 1. (Very soon we 
will see what do we need the bottom numbers for.) 



Hence, Eq. (136) may be presented in a more explicit form: 



Variable 
separation in 
cylindrical 
coordinates 
(example) 



z^ 



</>{p, ^,z) = XZ J n (£™ -)( c ,™ cos nq) + s nm sin n^)sinh £„„ - 

R V Rj 



n=0 m=\ 



(2.139) 



41 These two formulas for the Bessel function peak are strictly valid for n » 1, but may be used for reasonable 
estimates starting already from n = 1; for example, max; [Ji(£)] is close to 0.58 and is reached at £ « 2.4, just 
about 30% away from the values given by the asymptotic formulas. 

42 Eq. (135) and Fig. 16 clearly show the close analogy between the Bessel functions and the usual trigonometric 
functions, sine and cosine. In order to emphasize this similarity, and help the reader to develop more gut feeling 
of the Bessel functions, let me mention one fact of the elasticity theory: while sine functions describe, in 
particular, possible modes of standing waves on a guitar string, functions J„(g) describe, in particular, possible 
standing waves on an elastic round membrane, with J 0 (^) describing their lowest (fundamental) mode. 
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Here coefficients c nm and s nm have to be selected to satisfy the only remaining boundary condition - that 
on the top lid: 



V{p,(p) = <i>(p,<p,L) = Y J Y. J n^nm ~){ C nm ^OSHCp + S m SU1 ^SUlll 

n=0 m=l 



L_ 
R 



(2.140) 



To use it, let us multiply both parts of Eq. (140) by J n (£ nm 'p/R) cos n 'cp , integrate the result over the lid 
area, and use the following property of the Bessel functions: 



\j {£ s)j (£ ,s\ds = -\j A£ )] 2 S ,, 



(2.141) 



where 8 mm - is the Kronecker symbol. 43 



Table 2.1. Approximate values of a few first zeros of a few lowest-order Bessel functions J„{£) (the 
top number in each cell), and the values of dJ„/d£ at those points (the bottom number in the cell). 





m = 1 


2 


3 


4 


5 


6 


« = 0 


2.40482 
-0.51914 


5.52008 
+0.34026 


8.65372 
-0.27145 


11.79215 
+0.23245 


14.93091 
-0.20654 


18.07106 
+0.18773 


1 


3.83171 
-0.40276 


7.01559 
+0.30012 


10.17347 
-0.24970 


13.32369 
+0.21836 


16.47063 
-0.19647 


19.61586 
+0.18006 


2 


5.13562 
-0.33967 


8.41724 
+0.27138 


11.61984 
-0.23244 


14.79595 
+0.20654 


17.95982 
-0.18773 


21.11700 
+0.17326 


3 


6.38016 
-0.29827 


9.76102 
+0.24942 


13.01520 
-0.21828 


16.22347 
+0.19644 


19.40942 
-0.18005 


22.58273 
+0.16718 


4 


7.58834 
-0.26836 


11.06471 
+0.23188 


14.37254 
-0.20636 


17.61597 
+0.18766 


20.82693 
-0.17323 


24.01902 
+0.16168 


5 


8.77148 
-0.24543 


12.33860 
+0.21743 


15.70017 
-0.19615 


18.98013 
+0.17993 


22.21780 
-0.16712 


25.43034 
+0.15669 



Relation (141) expresses a very specific ("2D") orthogonality of Bessel functions with different 
indices m - do not confuse them with the function's order n, please! 44 Since it relates two Bessel 
functions with the same index n, it is natural to ask why its right-hand part contains the function with a 
different index (n + 1). Some clue may come from one more very important property of the Bessel 
functions, the so-called recurrent relations: 45 



43 Let me hope the reader knows what it is; if not - see MA Eq. (13.1). 

44 The Bessel functions of the same argument but of different orders are also orthogonal, but in a different way: 

J 0 £ n + n 

45 These relations provide, in particular, a convenient way for fast numerical computation of all J„(t,) after Jq{£) 
has been computed. (The latter is usually done with an algorithm using Eq. (132) for smaller £ and an extension 
of Eq. (135) for larger £. ) Note that most mathematical software packages, including all those listed in MA Sec. 
16(iv), include ready subroutines for calculation of functions J„(£) and other special functions used in this lecture 
series. In this sense, the line separated these "special functions" from "elementary functions" is rather blurry. 
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(0 + (0 = , (2- 142a ) 

^i(£W,, + .(£) = 2^^, (2.142b) 

that in particular yield the following relation (convenient for working out some Bessel function integrals): 

M$ n JM))=$ n J n -M)- (2-143) 
dg 

For our current purposes, let us apply the recurrent relations at special points g~ nm . At these 
points, J n vanishes, and the system of two equations (142) may be readily solved to get, in particular, 

J n+l (Zn m ) = - d ^(ZnJ, (2-144) 

dg 

so that the square bracket in the right-hand part of Eq. (141) is just (dJJdg~) 2 at £ = g~ nm . Thus the values 
of the Bessel function derivatives at the zero points (given by the lower numbers in the cells of Table 1) 
are as important for boundary problem solutions as the zeros themselves. 

Since the angular functions cos n<p are also orthogonal - both to each other, 

271 

Jcos(n^)cos(n V) dq> =nd nn , , (2.145) 
o 

and to all functions sin tip, the integration over the lid area kills all terms of both series in right-hand 
part of Eq. (140), besides just one term proportional to c n - m -, and hence gives an explicit expression for 
that coefficient. The counterpart coefficients s„- m - may be found by repeating the same procedure with 
the replacement of cos n '<p by sin n '<p. This evaluation (left for reader's exercise) completes the solution 
of our problem for an arbitrary lid potential V(p,(p). 

Still, before leaving the Bessel functions (for a while :-), we need to address two important 
issues. First, we have seen that in our cylinder problem (Fig. 15), the set of functions J n {^ nm plR) with 
different indices m (that characterize the degree of Bessel function's stretch along axis p) play the role 
similar to that of functions sm(7mx/a) in the rectangular box problem shown in Fig. 1 1 . In this context, 
what is the analog of functions cos(;zroc/a) - which may be important for some boundary problems? In a 
more formal language, are there any functions of the same argument £ = ^ nm p/R, that would be linearly 
independent of the Bessel functions of the first kind, while satisfying the same differential equation 
(130)? 

The answer is yes. For the definition of such functions, we first need to generalize our prior 
formulas for J„(g^, and in particular Eq. (132), to the case of arbitrary order v . The generalization may 
be performed in the following way: 

(2.146) 
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where T(s) is the so-called gamma function that may be defined, for almost any real s, as 46 

CO 

T{s) = \f~ X e~^d^. (2.147) 

o 

The simplest, and the most important property of the gamma function is that for integer values of 
argument it gives the factorial of a number smaller by one: 



r(n + i) = n! = i-2-...#i, 

so it is essentially a generalization of the notion of factorial to all real numbers. 



(2.148) 



The Bessel functions defined by Eq. (146) satisfy (after replacements n — > vand n\ — > F(n + 1)), 
virtually all the relations we have discussed above, including the Bessel equation (130), the asymptotic 
formula (135), the orthogonality condition (141), and the recurrent relations (142). Moreover, it may be 
shown that v ^n, functions JJ^%) and J-J^£) are linearly independent and hence their linear combination 
may be used to present a general solution of the Bessel equation. Unfortunately, as Eq. (131) shows, for 
v=n this is not true, and a solution independent of J n (£) has to be formed in a different way. 



The most common way of overcoming this difficulty is first to define, for all v^n, function 

J v (%)cosvx-J_ v (i;) 



sinv;z- 



(2.149) 



called the Bessel function of second kind, or more often as the Weber functions, 41 and then to follow the 
limit v — > n. At this, both the nominator and denominator of the right-hand part of Eq. (149) tend to 
zero, but their ratio tends to a finite value called Y n (x). It may be shown that these functions are still the 
solutions of the Bessel equation and are linearly independent of J n (x), though are related just as those 
functions if the sign of n changes: 



Y_ n (Z) = (-iyY n (Z). 



(2.150) 



Figure 17 shows a few Weber functions of the lowest integer orders. The plots show that the 
asymptotic behavior is very much similar to that of J n (£), 



1/2 



sin(£ 



n 
~4 



nn 



), for £ -> oo, 



(2.151) 



but with the phase shift necessary to make these Bessel functions orthogonal to those of the fist order - 
cf. Eq. (135). However, for small values of argument £ the Bessel functions of the second kind behave 
completely differently from those of the first kind: 

'{2/x\\n{%/2)+ r ], forn = 0, 



n \2) 



for n ^ 0, 



(2.152) 



46 See, e.g., MA Eq. (6.7a). I used word "almost" because the gamma- function tends to infinity at all non-positive 
integer values of its argument {s = 0, -1. -2, . . .). 

47 They are also sometimes called the Neumann functions, and denoted as NJ,^). 
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where 



7 



lim. 



1 1 



1 + — + - + ... + Inn 

V 2 3 n 



0.577157... 



(2.153) 



is the so-called Euler constant. Relations (152) and Fig. 17 show that functions Y„(%) diverge at £ — » 0 
and hence cannot describe the behavior of any physical variable, in particular the electrostatic potential. 
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Fig. 2.17. A few Bessel functions of the 
second kind (a.k.a. the Neumann 
functions, a.k.a. the Weber functions). 



One may wonder: if this is true, when do we need these functions in physics? This does not 
happen too often, but still does. Figure 18 shows an example of a boundary problem of electrostatics 
that requires both functions J n (4) an d Y„(g). 



0 = 0 



(a) 




(b) 



Y\ 7 1 



Fig. 2.18. Simple boundary 
problem that cannot be solved 
using just one kind of Bessel 
functions. 



Two round, coaxial conducting cylinders are kept at the same (say, zero) potential, but at least 
one of two horizontal lids has a different potential. The problem is almost completely similar to that 
discussed above (Fig. 15), but now we need to find the potential distribution in the free space between 
the cylinders, R\< p< R2. If we use the same variable separation as in the simpler counterpart problem, 
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we need the radial functions ^p) to satisfy two zero boundary conditions: at p = R\ and p = R2 . With 
the Bessel functions of just first kind, J n {yp), it is impossible to do, because the two boundaries would 
impose two independent (and generally incompatible) conditions, J n {yR\) =0, and J„{yR-2) =0, for one 
"compression parameter" y. The existence of the Bessel functions of the second kind immediately saves 
the day, because if a solution is presented as a linear combination, 48 

CjJ n {yp) + c T Y H {yp\ (2.154) 

two zero boundary conditions give two equations for y and ratio c = cylcj. (Due to the oscillating 
character of both Bessel functions, these conditions would be typically satisfied by an infinite set of 
discrete pairs {y c}.) Note, however, that generally none of these pairs would correspond to zeros of 
either J„ nor Y n , so that having an analog of Table 1 for the latter function would not help much. Hence, 
even the simple problems of this kind (like the one shown in Fig. 18) typically require numerical 
solutions of algebraic (transcendental) equations. 

One more issue we need to address, before moving on to the spherical coordinates, are the so- 
called modified Bessel functions: of the first kind, and of the second kind, KJ^Q. They are two 
linearly-independent solutions of the modified Bessel equation, 




(2.155) 



that differs from Eq. (130) "only" by the sign of one of its terms. Figure 19 shows a simple problem that 
leads to this equation: a round conducting cylinder is sliced, perpendicular to its axis, to rings of equal 
height h, which are kept at equal but sign-alternating potentials. 



Modified 

Bessel 

equation 



13 



0 = +V/2 
<I> = -VI2 
</> = +VI2 



Fig. 2.19. Typical boundary problem whose 
solution may be conveniently described in 
terms of the modified Bessel functions. 



If the gaps between the sections are narrow, t « h, we may use the variable separation method 
for the solution to this problem, but now we evidently need periodic (rather than exponential) solutions 



48 A pair of independent linear functions, used for presentation of the general solution of the Bessel equation, may 
be also chosen in a different way, using the so-called Hankel functions 

For representing the general solution of Eq. (130), this alternative is completely similar to using the pair of 
complex functions exp{+zca:} = cos ox ± /sin ax instead of the pair of real functions {cos ax, sin ax} for 
representing the general solution of Eq. (89) forX(x). 
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along axis z, i.e. linear combinations of sinkz and coskz with various real values of constant k. 
Separating the variables, we arrive at a differential equation similar to Eq. (129), but with the negative 
sign before the separation constant: 



^ + I«_ (t . + 4*-o. 

dp p dp p 



(2.156) 



Radial coordinate normalization, £ = kp, immediately leads us to Eq. (155), and hence (for v= n) to the 
modified Bessel functions and K„(^). 

Figure 19 shows the behavior of a few such functions, of a few lowest orders. One can see that at 
^ — > 0 it is virtually similar to that of the "usual" Bessel functions - cf. Eqs. (132) and (152), withiC„(£) 
multiplied (due to purely historical reasons) by an additional coefficient, nil: 



Lit) 



v2, 



In 



+ r 



v2y 
(n-l)!f£V» 



, for n = 0, 
, fovn* 0, 



(2.157) 



However, the asymptotic behavior of the modified functions is very much different, with I n (x) 
exponentially growing and K n (%) exponentially dropping at > co: 

.1/2 , y/2 



r l ^ 



71 



-4 



(2.158) 
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Fig. 2.20. Modified Bessel 
functions of the first kind (left 
panel) and the second kind 
(right panel). 



To complete our brief survey of the Bessel functions, let me note that all the functions we have 
discussed so far may be considered as particular cases of Bessel functions of the complex argument, say 
J n {?) and Y n (<z), or, alternatively, H„ l ' 2 \z) = J„{z) + iY n (<z). 49 The "usual" Bessel functions J n (%) and 



49 These complex functions still obey the general relations (143) and (146), with £ replaced with <l. 



Chapter 2 



Page 39 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



Y„(<%) may be considered as a set of values of these generalized functions on the real axis = £), while 
the modified functions as their particular case at v, = i£. 



i v = rj v (^), k v = ? r l H? (^) . 



(2.159) 



Moreover, this generalization of the Bessel functions to the whole complex plane « enables the use of 
their values along other directions on that plane, for example under angles nlA ± nil. As a result, one 
arrives at the so-called Kelvin functions 



ber^ + /beU^(£>- /W4 ), 



n "0)/e_-z'3^/4 



ker^ + /kei^ = /y//y j (^ 



(2.160) 



which are also useful for some important problems of mathematical physics and engineering. 
Unfortunately, we do not have time to discuss these problems in this course. 50 

(iv) Spherical coordinates are very important in physics, because of the (approximate) spherical 
symmetry of many objects - from electrons and nuclei and atoms to planets and stars. Let us again 
require each component </>k of Eq. (84) to satisfy the Laplace equation. Using the well known expression 
for the Laplace operator in spherical coordinates, 51 we get 



J_8_ 
r 2 dr 



dr 



+ ■ 



1 



8 



r 2 sin# dO 



5^ 
86 



+ ■ 



1 



r 2 sin 2 6 dip 2 



Let us look for a solution of this equation in the following variable-separated form: 



(2.161) 



(2.162) 



Separating equations one by one, just like this has been done in cylindrical coordinates, we get the 
following equations for the functions participating in this solution: 



d_ 



d 2 Z /(/ + !) 

7 2 2 

dr r 
dr 



5^ = 0, 



d% 

d 2 f 
dqf 



+ 



/(/ + !)- 



v 



-p = o, 



+ v 2 f = 0, 



(2.163) 
(2.164) 

(2.165) 



where £, = cos 6* is a new variable in lieu of ^(so that -1<^<+1), and v 2 and /(/+1) are the separation 
constants. (The reason for selection of the latter one in this form will be clear in a minute.) One can see 
that, in contrast with the cylindrical coordinates, the equation for the radial functions is quite simple. 



50 Later in the course we will also run into the so-called spherical Bessel functions j n {^) andy„(4), which may be 
expressed via the Bessel functions of a semi-integer order. Surprisingly enough, the spherical Bessel functions 
turn out to be much simpler than J„(4) and Y„(^). 

51 See, e.g.,MAEq. (10.9). 
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Indeed, let us look for its solution in the form cr a - just as we have done with Eq. (106). Plugging this 
solution into Eq. (163), we immediately get the following condition on parameter a: 

a{a-\) = l{l + \). (2.166) 
This quadratic equation has two roots, a = I + 1 and a = - 1, so that the general solution to Eq. (163) is 

(2.167) 



^ = a,r' +l + 

r 

Equation (165) is also very simple, and to some extent similar to Eq. (108) for the cylindrical 
coordinates. However, Eq. (164) function f[<^), where £ is the cosine of the polar angle 0, is the so- 
called Legendre differential equation, whose solution cannot be expressed via what is usually called 
"elementary functions" - though, again, there is no generally accepted line between them and "special 
functions". 

Let us start with axially -symmetric problems for which d<j)ld(p =0. This means ?(<p) = const, and 
thus v = 0, so that Eq. (164) is reduced to so-called Legendre 's ordinary differential equation: 

(2.168) 

at; [_ at; _ 

Equation One can readily check that the solutions of this equation for integer values of / are just specific 

and (Legendre) polynomials that may be defined, for example, by the following Rodrigues 'formula: 
polynomials 




i d i 

2'/! d? 



(£ 2 -iy, /=o,i,2,. 



(2.169) 



As follows from this formula, the first few Legendre polynomials are pretty simple: 



(2.170) 



n(£)=|fe 3 -3£) 

P 4 (f) = i(35£ 4 -3Qf a +3) .., 

o 

though such explicit expressions become more and more bulky as / is increased. As Fig. 21 shows, all 
these functions, that are defined on the [-1, +1] segment, start at one point, >°/(+l) = + 1, and end up 
either at the same point or in the opposite point: ^/(-l) = (-l/. On the way between these two end points, 

the /-th polynomial crosses the horizontal axis / times. It is straightforward to use Eq. (169) for proving 
that these polynomials form a full, orthogonal set of functions, with the following normalization rule: 



J 21 + 1 



(2.171) 
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so that any function f[<^), defined on the segment [-1, +1], may be presented as a unique series over the 
polynomials. 52 



\ \ 1 = 3 


1 = 0 y% 

1 = 4 / // 


/ X 1 = 1 


1 = 2 



Fig. 2.21. A few lowest Legendre 
polynomials 



Thus, taking into account the additional division by r in Eq. (162), the general solution of any 

axially-symmetric Laplace problem may be presented as Variable 

separation 
in spherical 
(2.172) coordinates 
(for axial 
symmetry) 



00 


f i b, > 






a i r + L 


^(cosfl). 


1=0 


v r j 





Please note a strong similarity between this solution and Eq. (112) for the 2D Laplace problem in polar 
coordinates. However, besides the difference in angular functions, there is also a difference (by one) in 
the power of the second radial function, and this difference immediately shows up in core problems. 

Indeed, let us solve a problem similar to that shown in Fig. 13: find the electric field around a 
conducting sphere of radius R, placed into an initially uniform external field Eo (whose direction we will 
take for axis z) - see Fig. 22a. If we select = 0, then a 0 = bo = 0. Now, just as has been argued for 
the cylindrical case, at r » R the potential should approach that for the uniform field: 



<j> — > -E 0 z = -E o rcos0, 



(2.173) 



and this again means that in Eq. (172), only one of coefficients ai survives: ai = -EoSn. Now, and from 
the boundary condition on the surface, ^){R,0) = 0, we get: 



0 = 



E 0 R + 



cos^+y 

I>2 K 



7> z (cos0). 



(2.174) 



This expression may be viewed as the expansion of function J{^) = 0 into a series of orthogonal 
functions >°/(£). Since such expansions are unique, and Eq. (174) is satisfied if 



E 0 R S,,, 



(2.175) 



this is indeed the only possibility to satisfy the boundary condition, so that, finally, 



52 As a result, there is not practical sense, at least for the purposes of this course, in pursuing (more complex) 
solutions to Eq. (168) for non-integer values of /. 



Chapter 2 



Page 42 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 





Fig. 2.22. Conducting sphere in a uniform electric field: (a) problem' geometry, and (b) the 
equipotential surface pattern given by Eq. (176). The pattern is qualitatively similar but quantitatively 
different from that for the conducting cylinder in a perpendicular field - cf. Fig. 13. 



This distribution, shown in Fig. 22b, is very much similar to Eq. (117) for the cylindrical case, 
but with a different power of radius in the second term. This leads to a quantitatively different 
distribution of the surface electric field: 

E a =-^\ rmR =3E 0 GOsO, (2.177) 
or 

so that its maximal value is a factor of 3 (rather than 2) larger than the external field. 

Now let us discuss the Laplace equation solution in the general case (no axial symmetry), but 
only for most important systems in which the free space surrounds the origin from all sides. In this case 
the solutions to Eq. (165) have to be 2^-periodic, and hence v= n = 0, ±1, ±2,... Mathematics says that 
the Legendre equation (164) with integer v = n and a fixed integer / has a solution only for a limited 
range of n: 53 

-l<n<+l. (2.178) 

These solutions are called the associated Legendre functions. For n > 0, they may be defined via the 
Legendre polynomials using the following formula: 

fe) = (-1)- {\-er l2 ^m)- (2. 179) 

dq 



53 In quantum mechanics, letter n is typically reserved used for the "main quantum number", while the azimuthal 
functions are numbered by index m. However, I will keep using n as their index, because for this course's 
purposes, this seems more logical in the view of the similarity of the spherical and cylindrical functions. 



Chapter 2 



Page 43 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



On the segment £ e [-1, +1], each set of the associated Legendre functions with a fixed index n and non- 
negative / form a full, orthogonal set, with the normalization relation, 



2 (/ + »)! 
2/ + 1 (/-«)! 



(2.180) 



that is evidently a generalization of Eq. (171). 

Since these relations may seem a bit intimidating, let me write down explicit expressions for a 
few 'Pn (cos 6?) with the lowest values of / and n > 0: 



1 = 0: 7>°(cos0) = l; 



1 = 1 



1 = 2 



7> o (cos6>) = cos0, 
?V(cos0) = -sin0; 

> 2 °(cos0) = (3cos 2 0-l)/2, 
T x 2 (cos G) = -2 sin 6 cos 6>, 
^ 2 2 (cos6>) = -3cos 2 #. 



(2.181) 
(2.182) 

(2.183) 



The reader should agree there is not much intimidation is these functions - which are most important for 
applications. 

Now the general solution (162) to the Laplace equation in the spherical coordinates may be 
spelled out as Variable 

separation 
in spherical 
(2.184) coordinates 
(general 
case) 



b 



1 



a,r +- 



/=o 



2 ?V (cos &)?„ (<p), f n (jp) = c n cos ncp + s„ sin ncp . 

)n=0 



Since the difference between angles 0 and cp is somewhat artificial, physicists prefer to think not about 
functions /°and f in separation, but directly about their products that participate in this solution. Figure 

23 shows a few such angular functions 54 by plotting their modulus along the radius, and using bi-color 
to show the function sign. While the lowest function (/ = 0, n = 0) is just a constant, two "dipole" 
functions (/ = 1) differ from each other by their spatial orientation. Functions with higher / (say, 1 = 2) 
differ more substantially, with the following general trend: for each value of /, the function with n = 0 is 
axially-symmetric 55 and has / zeros on its way from 6 = 0 to 6 = n, while the functions with n = I do not 
have zeros inside that interval, while oscillating most strongly as functions of (p. 



54 In quantum mechanics, it is more convenient to use a slightly different set of basic functions, namely complex 
functions called spherical harmonics, 



2/ + 1 (/-«)! 



1/2 



An (l + n)\ 

which are defined for both positive and negative n (within the limits -/<«<+/), because they form a full set of 
orthonormal eigenfunctions of angular momentum operators L 2 and L z - see, e.g., QM Sees. 3.6 and 5.6. 
55 According to Eq. (179), these functions involve only the Legendre polynomials 7) = /°/° . 
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1 = 0: 



1=1: 



1 = 2: 




Fig. 2.23. Several products V" (cos G)f n (<p) 

with the lowest values of positive / and n. Color 
shows function's sign, while distance from the 
origin, its magnitude. (Adapted from Web site 
http://people.csail.mit.edu/sparis/ sh/ ). 



As an exception, in order to save time, I will skip an example of application of the associated 
Legendre functions, because several such examples are given in the quantum mechanics part of these 
series. (Note that in this field, index n is traditionally called m - the magnetic quantum number) 



2.6. Charge images 

So far, we have discussed various methods of solution of the Laplace boundary problem (35). 
Let us now move on to the discussion of its generalization, the Poisson equation (1.41), that we need 
when besides the conductors, we also have "free" charges with a known spatial distribution p{r). (This 
will also allow us, better equipped, to revisit the Laplace problem again in the next section.) 

Let us start with a somewhat limited, but sometimes very useful charge image method. Consider 
a very simple problem: a single point charge near a conducting half-space - see Fig. 24. It is 
straightforward to prove that its solution, above conductor's surface (z > 0), may be presented as: 



i 



A7TS n 



q q 



'2 J 



Ane n 



1 



1 



(2.185) 



or in a more explicit (coordinate) form: 



47T£ n 



(2.186) 
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where p is the distance of the observation point from the vertical line on which the charge is located. 
Indeed, this solution evidently satisfies both the boundary condition of zero potential at the surface of 
the conductor (z = 0), and the Poisson equation (1.41), with the single ^-functional source at point r' = 
{0, 0, d} in its right-hand part, because its another singularity, at point r" = {0, 0, -</}, is outside the 
region of validity of this solution (z > 0). 




Fig. 2.24. The simplest problem readily solvable by 
the method of images. Point colors in this section are 
used, here and in the balance of this section, to denote 
charges of the original (red) and opposite (blue) sign. 



Physically, the solution may be interpreted as the sum of the fields of the actual charge (+q) at 
point r ' , and an equal but opposite charge (-q) at the "mirror image" point r" (Fig. 24). This is the basic 
idea of the charge image method. Before moving to more complex problems, let us discuss the situation 
shown in Fig. 24 in a little bit more detail. First, we can use Eqs. (3) and (186) to calculate the surface 
charge density: 



dd> I q d 1 1 

cr = -s 0 — L o = 

dz " Ah dz 

The total surface charge is 

OO 

q 



q 2d 



\3/2 



(2.187) 



Q = \cjd 2 r = 2n\cj{p)pdp = -fj- ~ w 2pdp . (2.188) 

a o 2 o [p + d ) 



2 2 

This integral may be easily taken using the substitution % = pld (giving d%= 2pdpld ): 

e ~!fcfr~* <2J89) 

This result is very natural, because the conductor "wants" to bring as much surface charge from its 
interior to the surface as necessary to fully compensate the initial charge (+q) and hence to kill the 
electric field at large distances as efficiently as possible, hence reducing the total electrostatic energy 
(1.67) to the lowest possible value. 

For a deeper understanding of this polarization charge of the surface, let us take our calculations 
to the extreme - to q equal to one elementary change e, and place a particle with this charge (for 
example, a proton) at a macroscopic distance - say 1 m - from conductor's surface. Then, according to 
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Eq. (189), the total polarization charge of the surface equals to that of an electron, and according to Eq. 
(187), its spatial extent is of the order of d = 1 m 2 . This means that if we consider a much smaller part 
of the surface, AA « d 2 , its polarization charge magnitude AQ = aAA is much less than one electron\ 
For example, Eq. (187) shows that the polarization charge of quite a macroscopic area AA = 1 cm right 
under the initial charge (p = 0) is eAAIlnd 1 ~ 1.6xl0" 5 e. Can this be true, or our theory is somehow 
limited to the charges much larger than el 

Surprisingly enough, the answer to this question has become clear (at least to some physicists :-) 
only as late as in the mid-1980s when several experiments demonstrated, and theorists accepted, some 
rather grudgingly that the usual polarization charge formulas are valid for elementary charges q as well, 
i.e., such the polarization charge AQ of a macroscopic surface area can indeed be less than e. The 
underlying reason for this paradox is the nature of the polarization charge of the conductor surface: as 
should be clear from our discussion in Sec. 1, it is due not to new charged particles brought into the 
conductor (such charge would be in fact quantized in the units of e), but to a small shift of the free 
charges of a conductor by a very small distance from their equilibrium positions that they had in the 
absence of the external field induced by charge q. This shift is not quantized, at least on the scale 
relevant for our issue, and neither is AQ. This understanding has opened a way toward the invention and 
experimental demonstration of several new devices including single-electron transistors, 56 which may 
be, in particular, used to measure polarization charges as small as ~10" 6 e. 

To complete the discussion of our initial problem (Fig. 24), let us find the potential energy U of 
the charge-to-surface interaction. For that we may use the value of the electrostatic potential (185) in the 
point of the charge itself (r = r '), of course ignoring the infinite potential created by the charge itself, so 
that the remaining potential is that of the image charge 

*- (r >-i£- (2 - l90) 

Looking at the definition of the electrostatic potential, given by Eq. (1.31), it may be tempting to 
immediately write U= q0i mage = - {\IAns<^){ql2d) [WRONG!], but this would not be correct. The reason 
is that potential $ mage is not independent of q, but is actually induced by this charge. This is why the 
correct approach is to use Eq. (1.63), with just one term: 



1 1 q 



2 



U = -qA^ e =~- 77' (2-191) 



2 J ' ' maBC 4tt£ q 4d 



twice lower in magnitude than the wrong result cited above. In order to double-check this result, and 
also get a better feeling of the factor Vi that distinguishes it from the wrong guess, we can recalculate 
energy U as the integral of the force exerted on the charge by the conductor (i.e., in our formalism, by 
the image charge): 



U = -\F{z)dz=^—\^—dz= (2.192) 



56 Actually, this term (for which the author of these notes should be blamed :-) is misleading: operation of the 
single-electron transistor is based on the interplay of discrete charges (multiples of e) transferred between 
conductors, and si^-single-electron polarization charges - see, e.g., K. K. Likharev, Proc. IEEE 87, 606 (1999). 
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This calculation clearly accounts for the gradual build-up of force F, as the real charge is brought from 
afar (where we have opted for U=0) toward the surface. 

This result, and the fact that it may be used for elementary particles with I q\ = e (in particular, 
electrons), has several important applications. For example, let us plot energy U for an electron near a 
metallic surface, as a function of d. For that, we may use Eq. (192) until our macroscopic approximation 
(2) becomes invalid, and U transitions to some negative constant value (- y/) inside the conductor - see 
Fig. 25a. 




Fig. 2.25. (a) Origin of 
the workfunction and (b) 
the field emission of 
electrons (schematically). 



The positive constant y/ is called workfunction, because it describes how much work should be 
done on an electron to remove it from the conductor. As was discussed in Sec. 1, in good metals the 
electric field screening happens at interatomic distances a 0 « 10" 10 m. Plugging d = a and q = -e into Eq. 
(191), we get « 6xl0" 19 J « 3.5 eV. This crude estimate is in a surprisingly good agreement with the 
experimental values of the workfunction, ranging between 4 and 5 eV for most metals. 57 

Next, let us consider the effect of an additional external electric field E 0 applied perpendicular to 
a metallic surface, on this potential profile. Assuming the field to be uniform, we can add its potential - 
eEod at distance d from the surface, to that created by the image charge. (As we know from Eq. (1.53), 
since field Eo is independent of the electron position, its recalculation to the potential energy does not 
require the coefficient l A.) As the result, the potential energy of an electron near the surface becomes 

1 e 2 

U(d) = -eE 0 d , for d>a 0 , (2.193) 

AkSq Ad 

with a similar crossover to U = - y/ inside the conductor - see Fig. 25b. One can see that at the 
appropriate sign, and sufficient magnitude of the applied field, it lowers the potential barrier that 
prevented electron from leaving the conductor. At E 0 ~ y//ao this suppression becomes so strong that 
electrons just below the Fermi surface start quantum-mechanical tunneling through the remaining thin 
barrier. This is the field emission effect, which is used in vacuum electronics to provide efficient 
cathodes that do not require heating to high temperatures. 58 

Returning to the basic electrostatics, let us consider some other geometries where the method of 
images may be effectively applied. First, let us consider a right corner (Fig. 26a). Reflecting the initial 



57 For more discussion of workfunction, and its effect on electron kinetics, see, e.g., SM Sec. 6.4. 

58 The practical development of such "cold" cathodes is strongly affected by the fact that any nanoscale surface 
irregularity (a protrusion, an atomic cluster, or even a single "adatom" stuck to the surface) may cause a strong 
increase of the local field well above the average applied field E 0 (see, for example our discussion in Sec. 4 
above), making the emission reproducibility an issue. 
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charge in the vertical plane we get the image charge shown in the top left corner of the panel, that makes 
the boundary condition <fi = const satisfied on the vertical surface of the corner. However, in order the 
same to be true on the horizontal surface, we have to reflect both the initial charge and the image charge 
in the horizontal plane, flipping their signs. The final configuration of 4 charges, shown in Fig. 26a, 
satisfies all the boundary conditions. The resulting potential distribution may be readily written as the 
evident generalization of Eq. (185). From there, the electric field and electric charge distributions, and 
the potential energy and forces acting on the charge may be calculated exactly as above. 



(a) (b) (c) 




Fig. 2.26. Charge images for (a, b) internal corners with angles ;rand Jtll, (c) plane capacitor, and (d) 
rectangular box, and (d) equipotential surfaces for the last system. 

Next, consider a corner with angle nIA (Fig. 26b). Here we need to repeat the reflection operation 
not 2 but 4 times before we arrive at the final pattern of 8 positive and negative charges. (Any attempt to 
continue this process would lead to an overlap with the already existing charges.) This reasoning can be 
readily extended to any 2D corner with angle /? = nln, with any integer n, that requires 2n charges 
(including the initial one) to satisfy all the boundary conditions. 

Some configurations require an infinite number of images that are, however, tractable. The most 
important of them is a system of two parallel conducting surfaces, i.e. a plane capacitor of infinite area 
(Fig. 26c). Here the repeated reflection leads to an infinite system of charges ±q at points 
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=±d + 2Dj 



(2.194) 



where 0 < d < D is the position of the initial charge and j an arbitrary integer. However, the resulting 
infinite series for the potential of the real charge q, created by the field of its images, 



1 



<\ns n 



2d yV0 + \d ■ 



■X~: 



J_ d_ 
2d D 



7=1 J\J 



1 



[j 2 -(d/D) 2 [ 



(2.195) 



is converging (in its last form) very fast. For example, the exact value, $XV2) 
by less than 5% from the approximation using just the first term of the sum. 



-2ln2(q/4 ttsqD), differs 



The same method may be applied to 2D (cylindrical) and 3D rectangular boxes that require, 
respectively, a 2D or 3D infinite lattices of images; for example in a 3D box with sides a, b, and c, 
charges ±q are located at points (Fig. 26d) 



±r' + 2ja + 2kb + 2lc , 



(2.196) 



where r ' is the location of the initial (real) charge, and j, k, and / are arbitrary integers. Figure 26e shows 
the results of summation of the potentials of such charge set, including the real one, in a 2D box (within 
the plane of the real charge). One can see that the equipotential surfaces, concentric near the charge, are 
naturally leaning along the conducting walls of the box, which should be equipotential. 

Even more surprisingly, the image charge method works very efficiently not only for the 
rectilinear geometries, but also for spherical ones. Indeed, let us consider a point charge q at some 
distance d from the center of a conducting, grounded sphere of radius R (Fig. 27a), and try to satisfy the 
boundary condition <fi = 0 for the electrostatic potential on sphere's surface using an imaginary charge q ' 
located at some point located beyond the surface, i.e. inside the sphere. 




From problem's symmetry, it is clear that the point should be at the line passing through the real 
charge and the sphere's center, at some distance d' from the center. Then the total potential created by 
the two charges at an arbitrary point with r > R (Fig. 27a) is 
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«r,ff) = 



4ns, 



(r 2 +d 2 -IrdcosO) 12 + (r 2 +d' 2 -Ird'cosO)' 



(2.197) 



It is easy to see that we can make the two fractions to be equal and opposite at all points on the sphere's 
surface (i.e. for any 0atr = R), if we take 59 

,2 



d' = 



R' 



q- 



R 



-q. 



(2.198) 



Since the solution to any Poisson boundary problem is unique, Eqs. (197) and (198) give us such 
solution for this problem. Figure 27b shows a typical equipotential pattern calculated using Eqs. (197) 
and (198). It is surprising how formulas that simple may describe such a nontrivial field distribution. 

Now let us calculate the total charge Q induced by charge q on conducting sphere's surface. We 
could do this, as we have done for the conducting plane, by the brute force integration of the surface 
charge density <j= -s 0 d<f>/dr\ r = R . It is more elegant, however, to use the following Gauss law argument. 
Expression (197) is valid (at r > R) regardless whether we are dealing with our real problem (charge q 
and the conducting sphere) or with the equivalent charge configuration (point charges q and q ', with no 
sphere at all). Hence, according to Eq. (1.16), the Gaussian integral over a surface with radius r = R + 0, 
and the total charge inside the sphere should be also the same. Hence we immediately get 



Q = q' = -^q- 
d 

The similar argumentation may be used to find the charge-to-sphere interaction force: 



F = q E im ase (d) = q 



q' 



q 



R 



l 



q 



Rd 



47rs 0 (d-d') 2 4tts 0 d (d - R 2 / dy 



4ns o (d' 



R 2 ) 2 



(2.199) 



(2.200) 



(Note that this expression is legitimate only at d > R.) At large distances, dIR » 1, this attractive force 
decreases as lid . This unusual dependence arises because, as Eq. (198) specifies, the induced charge of 
the sphere, responsible for the force, is not constant but decreases as Q cc \/d. 

All the previous formulas referred to a sphere that is grounded to keep its potential equal to zero. 
But what if we keep the sphere galvanically insulated, so that its net charge is fixed, e.g., equals zero? 
Instead of solving the problem from the scratch, let us use (again!) the linear superposition principle. For 
that, we may add to the previous problem an additional charge, equal to (-0, to the sphere, and argue 
that this addition gives an additional potential that does not depend of the potential induced by charge q. 
For the interaction force, such addition yields 



F = 



qq 



4x£ 0 (d-d') 2 



qQ 



q 



4nS Q d' 



4ns n 



Rd 



2x2 



(d z -R A ) 



R_ 



(2.201) 



At large distances, the two terms proportional to lid cancel each other, giving F cc lid . Such a rapid 
force decay is due to the fact that the field of the uncharged sphere is equivalent to that of two (equal 
and opposite) induced charges +Q and - Q, and the distance between them (d' = R Id) tends to zero at d 



59 In geometry, such points, with dd' = R 2 , are referred to as the result of mutual inversion in a sphere of radius R. 
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— > co. The potential energy of such interaction behaves as U cc lid 6 at d — > go; in the next chapter we will 
see that this is the general law of the induced dipole interaction. 

2.1 . Green's functions 

I have spent so much time/space discussing the potential distributions created by a single point 
charge in various conductor geometries, because, for any geometry, the generalization of these results to 
the arbitrary distribution p(r) of free charges is straightforward. Namely, if a single charge q, located at 
point r ', created electrostatic potential 

m = -^—qG{r,V), (2.202) 
then, due to the linear superposition principle, an arbitrary charge distribution creates potential 



Spatial 
(2.203) Green's 



function 



Kernel G(r, r') is called the (spatial) Green's function - the notion very popular in all fields of 
physics. 60 Evidently, as Eq. (1.35) shows, in the unlimited free space 

G(r,r') = — (2.204) 
r-r 

i.e. the Green's function depends only on one scalar argument - the distance between the field 
observation point r and the field-source (charge) point r'. However, as soon as there are conductors 
around, the situation changes. In this course we will only deal with Green's functions that are defined in 
the space between conductors, and that vanish as soon as the radius-vector r points to the surface of any 
conductor: 61 

G(r,r')| re ^=0. (2.205) 

With this definition, it is straightforward to deduce the Green's functions for the solutions of the 
last section's problems in which conductors were grounded (^ = 0). For example, for a semi-space z > 0 
limited by a conducting plane (Fig. 24), Eq. (185) yields 

G = — i —-— 1 —, with p" = p' and z" = -z' . (2.206) 
r-r' r-r" 

We see that in the presence of conductors (and, as we will see later, any other polarizable media), the 
Green's function may depend not only on the difference r-r', but in a specific way from each of these 
two arguments. 

So far, this looked just like re-naming our old results. The really non-trivial result of the Green's 
function application to electrostatics is that, somewhat counter-intuitively, the knowledge of the 



60 See, e.g., CM Sec. 4.1, QM Sees. 2.2, 1.2 and 7.4, and SM Sec. 5.5. 

61 G so defined is sometimes called the Dirichlet function. 
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Green's function for a system with grounded conductors (Fig. 28a) allows one to calculate the field 
created by voltage-biased conductors (Fig. 28b), with the same geometry. 





Fig. 2.28. Green's function method allows the solution of a simpler boundary problem (a) to be used to find 
the solution of a more complex problem (b), for the same conductor geometry. 



In order to show that, let us use the so-called Green's theorem of the vector calculus. 62 The 
theorem states that for any two scalar, differentiable functions yTr) and g(r), and any volume V, 



\(fV 2 g-gV 2 f)d 3 r = §{fVg-gVf) n d 2 r, 



(2.207) 



where S is the surface limiting the volume. Applying the theorem to the electrostatic potential far) and 
the Green's function G (also considered as a function of r), let us use the Poisson equation (1.41) to 
replace V (/) with (-p/so), and notice that G, considered as a function of r, obeys the Poisson equation 
with the ^-functional source: 



V 2 G(r,r') = -4^(r-r'). 



(2.208) 



(Indeed, according to its definition (202), this function may be formally considered as the field of a 
point charge q = 47rs 0 .) Now swapping the notation of radius-vectors, r <->• r', and using the Green's 
function symmetry, G(r, r') = G(r ', r), 63 we get 



( P(r')' 


G(r,r')d 3 r' = | 


1 £ o J 


5 - 



8n' 



dn' 



2 ,,' 



d z r 



(2.209) 



Let us apply this relation to volume V of free space between the conductors, and the boundary A 
slightly outside of their surface. In this case, by its definition, the Green's function G(r, r') vanishes at 
the conductor surface (r g S) - see Eq. (205). Now changing the sign of dn ' (so that it would be the 
outer normal for conductors, rather than free space volume V), dividing all terms by An, and partitioning 
the total surface A into the parts (numbered by index j) corresponding to different conductors (possibly, 
kept at different potentials fa), we finally arrive at the famous result: 64 



62 See, e.g., MA Eq. (12.3). Actually, this theorem is a ready corollary of the divergence theorem, MA Eq. (12.2). 

63 This symmetry, virtually evident from Eq. (204), may be formally proved by applying Eq. (207) to functions / 
(r) = G(r, r') and g(r) = G(r, r"). With this substitution, the left-hand part becomes equal to -4n[G(r", r') - G(r ', 
r ")], while the right-hand part is zero, due to Eq. (205). 

64 In some textbooks, the sign before the surface integral is negative, because their authors use the outer normal of 
the free-space region V rather than that occupied by conductors - as I do. 
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Potential 



fl 9 1 fA expressed 
(Z.Z lUj via Green's 



function 



While the first term in the right-hand part of this relation is a direct and evident expression of the 
superposition principle, given by Eq. (203), the second term is highly non-trivial: it describes the effect 
of conductors with nonvanishing potentials <pk (Fig. 28b) using the Green's function calculated for the 
similar system with grounded conductors, i.e. with all fa = 0 (Fig. 28a). Let me emphasize that since our 
volume V excludes conductors, the first term in the right-hand part of Eq. (210) includes only the "free- 
standing" charges of the system (in Fig. 28, marked q it q%, etc.), but not the surface charges of the 
conductors - which are taken into account, implicitly, by the second term. 

In order to illustrate what a powerful tool Eq. (210) is, let us use to calculate the electrostatic 
field in two systems. In the first of them, a circular disk, separated with a very thin cut from a 
conducting plane, is biased with potential <f> = V, while the rest of the plane is grounded, <f> = 0 - see Fig. 
29. If the width of the gap between the circle and rest of the plane is negligible, we may apply Eq. (210) 
with p(r') = 0, and the Green's function for the uncut plane - see Eq. (206). 65 In the cylindrical 
coordinates, the function may be rewritten as 



G(T,T')- 



1 



1 



(p 2 + p' 2 - 2pp' cos((p - cp') + (z-z') 2 ) m fa* + p * - 2p p< C0S (^ -(p') + (z + z') 2 ) 12 



(2.211) 



(The sum of the first three terms under the square roots of Eq. (211) is just the squared distance between 
the horizontal projections p and p' of vectors r and r' (or r"), correspondingly, while the last terms are 
the squares of their vertical spacings.) 



^ = 0 




Fig. 2.29. Voltage-biased conducting circle inside a grounded conducting plane. 



Now we can readily calculate the necessary derivative: 
dG , dG , 2z 



dn' s dz' z ' =+0 (p 2 + p' 1 -2pp'cos((p-(p') + z 2 ) V1 



(2.212) 



Due the axial symmetry of the system, we can take <p for zero. With this, Eqs. (210) and (212) yield 



65 Indeed, if all parts of the cut plane are grounded, a narrow cut does not change the field distribution, and hence 
the Green's function, significantly. 
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V j;5G(r,r') jV _ Vz 



=f Mr 

2 ^ 0 0 1/ 



p'dp' 



.2 13/2 



(2.213) 



dri In i ' {{p 2 + p' 2 -Ipp'coscp' + z\ 

This integral is not too pleasing, but may be readily worked out for points on the symmetry axis (p = 0): 



p'dp' __V ( d% 

2 , 2 



,3/2 



= v 



1- 



(R 2 + z 2 ) 112 



This expression shows that if z — > 0, the potential tends to F~(as it should), while at z » R, 



2z 2 



(2.214) 



(2.215) 



This asymptotic behavior is typical for electric dipoles - see the next chapter. 

Now, let us use the same Eq. (210) to solve the (in :-)famous problem of the cut sphere (Fig. 30). 
Again, if the gap between the two conducting semi-spheres is very thin (t « R), we may use the 
Green's function for the grounded (and uncut) sphere. For a particular case r' = dn z , this function is 
given by Eqs. (197)-(198); generalizing the former relation for an arbitrary direction of vector r ', we get 



1 



Rlr' 



(r 2 +r' 2 -Irr'cosy) 12 (r 2 + (R 2 1 r'f - 2r(R 2 1 r') cosy )" 
where y is the angle between vectors r and r ', and hence r" (Fig. 30). 



for r,r'>R. 



(2.216) 




Fig. 2.30. A system of two, oppositely biased semi- 
spheres. 



Now, finding the Green's function's derivative, 



dG, 

df .,\r'=R + 0 



(r 2 -R 2 ) 



R\r 2 +R 2 -2i?rcos^] 3/2 ' 



(2.217) 



and plugging it into Eq. (210), we see that the integration is easy only for the field on the symmetry axis 
(r = m z , y = 9) , giving 



(l> = V 



1 



z 2 -R 2 



:(z 2 + R 2 ] 



12 



(2.218) 
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For z — > R, ^ — > F(just checking :-), while for z » R, 

3R 2 



^ — > F~ 



2z 2 



(2.219) 



so this is also an electric dipole field - see the next chapter. 



2.8. Numerical methods 

Despite the richness of analytical methods, for many boundary problems (especially in 
geometries without high degree of symmetry), numerical methods is the only way to the solution. 
Despite the current abundance of software codes and packages offering their automatic numerical 
solution, 66 it is important to an educated physicist to understand "what is under their hood", at least 
because most universal programs exhibit mediocre performance in comparison with custom codes 
written for particular problems, and sometimes do not converge at all, especially for fast-changing (say, 
exponential) functions. 

The simplest of the numerical methods of solution of partial differential equations is the finite- 
difference method 67 in which the sought function of /V scalar arguments f\r\, r 2 ,.. .r N ) is represented by 
its values in discrete points of a rectangular grid (also called mesh) of the corresponding dimensionality 
(Fig. 31). 



(a) 



(b) 
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H 




— 1 

h 




i — ( 

fi 

►-^ — 














h h 

Fig. 2.31. General idea of the finite-difference method in (a) one, (b) two, and (c) three dimensions. 



Each partial second derivative of the function is approximated by the formula that readily 
follows from the linear approximations for the function / and then its partial derivatives - see Fig. 31a: 



d 2 f 
3rj 
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r 8f" 


^ 1 


'df 


df 




^ 1 


[A-/ f-U 




drj 


K dr U 


~~h 


K dr j 


r J+ h/2 ^ 
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' ) 


* It 


h h 


h 2 



(2.220) 



where /_> =f{rj + h) and where /<_ =J[rj - h). (The relative error of this approximation is of the order of 
h 4 d 4 f/drj 4 .) As a result, a 2D Laplace operator may be presented as 



66 See, for example, MA Sees. 16 (iii) and (iv). 

67 For more details see, e.g., R. J. Leveque, Finite Difference Methods for Ordinary and Partial Differential 
Equations, SIAM, 2007. 
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8x 2 dy 2 
while the 3D operator as 



d 2 f , d 2 f _ u +u-if | A+n-2f _ f^+f^+A+f i -4f 



(2.221) 



d 'f. + IL + Ql f - A + f- +fi +f®+L -6/ 



dx dy 



dz< 



(2.222) 



(The notation used in these formulas should be clear from Figs. 31b and 31c, respectively.) 

Let us apply this scheme to find the electrostatic potential distribution inside of a cylindrical box 
with conducting walls and square cross-section, using an extremely coarse mesh with step h = a 1 2 (Fig. 
32). In this case our function, the electrostatic potential, equals zero on the side walls and the bottom, 
and equals to Vo at the top lid, so that, according to Eq. (221), the Laplace equation may be 
approximated as 



0 + 0 + K 0 +0-4^_ 
(a/2) 2 



0. 



(2.223) 



The resulting value for the potential in the center of the box is <j) = V 0 /4. Surprisingly, this is the exact 
value! This may be proved by solving this problem by the variable separation method, just as this has 
been done for the similar 3D problem in Sec. 4 above. The result is 



0(x,y) = Y,c n sin— sinh 
a 



my 
a 



4V 



c = • 



fl, if n is odd, 



^nsinh(^n) 1 0, otherwise. 



so that at the central point (x=y = all), 

, _ 4K o V sinfc(2 7 + 1) / 2]sinh[^-(2 j + 1) / 2] _ 2V 0 



(-1) 7 

it — (2y + l)sinh[;z-(2y +1)] n (2j + l)cosh[^(27 + \)/2] 

The last series equals exactly to 7r/8, so that 0= Vol 4. 



(2.224) 



(2.225) 



a/2 




: §:<l> 




a/2 



Fig. 2.32. Numerical solution of the internal 2D boundary 
problem for a conducting, cylindrical box with square cross- 
section, using a very coarse mesh (with h = a/2). 



For a similar 3D problem (a cubic box) we can use Eq. (222) to get 



0 + 0 + V 0 + 0 + 0 + 0-6^ 

(a/2) 2 



= 0, 



(2.226) 
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so that (j) = Vo/6. Unbelievably enough, this result is also exact! (This follows from our variable 
separation result expressed by Eqs. (95) and (99).) 

Though such exact results should be considered as a happy coincidence rather than the norm, 
they still show that numerical methods, with a relatively crude mesh, may be more computationally 
efficient than the "analytical" approaches, like the variable separation method with its infinite-series 
results that, in most cases, require computers anyway for the result comprehension and analysis. 

A more powerful (but also much more complex for implementation) approach is the finite- 
element method in which the discrete point mesh, typically with triangular cells, is (automatically) 
generated in accordance with the system geometry. Such mesh generators provide higher point 
concentration near sharp convex parts of conductor surfaces, where the field concentrates and hence the 
potential changes faster, and thus ensure better accuracy-to-performance trade-off than the finite- 
difference methods on a uniform grid. The price to pay for this improvement is the algorithm complexity 
that makes manual adjustments much harder. Unfortunately I do not have time for going into the details 
of that method, and have to refer the reader to the special literature on this subject. 68 

2.9. Exercise problems 

2.1 . Calculate the force (per unit area) exerted on a conducting surface by an external electric 
field. Compare the result with the definition of the electric field given by Eq. (1.6) of the lecture notes, 
and comment. 

22. Following the discussion of two weakly coupled spheres in Sec. 2, find an approximate 
expression for the mutual capacitance (per unit length) between two very thin, parallel wires, both with 
a round cross-section, but each with its own diameter. Compare the result with that for two small 
spheres, and interpret the difference. 

23 . Using the results for a single thin round disk, obtained in Sec. 4, consider a system of two 
such disks at a small distance d«R from each other - see Fig. on the right. In particular, calculate: 

(i) the reciprocal capacitance matrix of the system, 

(ii) the mutual capacitance between the disks, 

(iii) the partial capacitance, and 

(iv) the effective capacitance of one disk, 

(all in the first non-vanishing approximations in dIR « 1). Compare the results (ii)-(iv) and interpret 
their similarities and differences. 

2.4 . Calculate the mutual capacitance (per unit length) between two cylindrical conductors, 
forming a system with the cross-section shown in Fig. on the right, in the limit t « w « R. 



68 See, e.g., C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, 
Dover, 2009, or T. J. R. Hughes, The Finite Element Method, Dover, 2000. 
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Hint: You may like to use the degenerate elliptical (not 
"ellipsoidal"!) coordinates defined by the following equation: 



x + iy = c cosh(« + ifi), 



(*) 



with the appropriate choice of constant c. In these orthogonal 2D 
coordinates, the Laplace operator is very simple: 



V 2 =- 



1 



c 2 (cosh 2 a - cos 2 /?) 



d 2 d 2 } 
- + 



8a 2 8J3 2 



(This is not quite surprising, because Eq. (*) may be also 
considered as a conformal map « = c coshaz, where i = x + iy, and 
at = a + i/3.) 




2.5 . Formulate 2D electrostatic problems that can be solved using each of the following analytic 
functions of the complex variable ^ = x + iy: 

(i) ut = In «, 



1/2 



(ii) m '- 

and solve these problems. 



2.6 . Complete the cylinder problem started in Sec. 5 (see Fig. on the 
right), for the cases when voltage on the top lid is fixed as follows: 

(i) V = VoJ\(i;uplR)sm(p, where ~ 3.832 is the first root of 
function J\{x), and 

(ii) V=Vo = const. 

For both cases, calculate the electric field in the centers of the lower 
and upper lids. (For assignment (ii), an answer including series and/or 
integrals is satisfactory.) 



L 



<t> = V{p,cp) 



0 = 0 



R 



y 



2.1 . Each electrode of a large plane capacitor is cut into long 
strips of equal width /, with very narrow gaps between them. These 
strips are kept at the alternating potentials as shown in Fig. on the 
right. Use the variable separation method to calculate the 
electrostatic potential distribution. Explore the limit / « d. 



V V 



V 



+ - 
2 2 



V v_ _v_ 

~2 2 2 



I 

< — > 



Chapter 2 



Page 59 of 60 



Essential Graduate Physics 



EM: Classical Electrodynamics 



2.8 . Solve the problem shown in Fig. 19 of the lecture notes 
(reproduced on the right); in particular, find the distribution of the 
electrostatic potential along cylinder's axis. 



13 



<j) = +VI2 
</> = -VI2 
(/> = +VI2 



2.9 . Use the variable separation method to find the potential 
distribution inside and outside of a thin spherical shell of radius R, with fixed potential (p{R,6,(p) = Vo 
sin 6 cos (p. 



2.10 . A thin spherical shell carries charge with areal density <j= oocosft Calculate the spatial 
distribution of the electrostatic potential and field. 



2.1 1 . Use the image charge method to calculate the surface charges induced in the plates of a 
very broad plane capacitor of thickness D by a point charge q separated from one of the electrodes by 
distance d - see Fig. 26c. 



2.12 . Use the image charge method to calculate the energy of electrostatic interaction between a 
point charge placed in the center of a spherical cavity that was cut inside a grounded conductor, and the 
conductor. Looking at the result, could it be obtained in a simpler way (or ways)? 



2.13 . Find the 2D Green's function in: 

(i) the unlimited free space, and 

(ii) the free space above a conducting plane. 

Use the latter result to calculate the distribution of the electric potential created by a cylindrical system 
of conductors, with the cross-section shown in Fig. below. (The insulating gaps between the conducting 
fragments are very narrow.) 



wl2 



+ w/2 




2.14 . Solve the same 2D boundary problem that was discussed in Sec. 6 (Fig. 32) using: 

(i) the finite difference method, with a finer square mesh, h = a/3, and 

(ii) the variable separation method. 
Compare the results (at the mesh points only) and comment. 
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Chapter 3. Polarization of Dielectrics 

In the last chapter, we have discussed the electric polarization of conductors. In contrast to those 
materials, in dielectrics the charge motion is limited to the interior of an atom or a molecule, so that the 
electric polarization of these materials by external field takes a different form. This issue is the main 
subject of this chapter. In preparation to the analysis of dielectrics, we have to start with a more general 
discussion of the electric field of a spatially-restricted system of charges. 



3.1. Electric dipole 

Let us consider a localized system of charges, of a linear size scale a, and calculate a simple but 
approximate expression for the electrostatic field induced by the system at a distant point r. For that, let 
us select a reference frame with the origin either somewhere inside the system, or at a distance of the 
order of a from it (Fig. 1). 



v 




Fig. 3.1. Deriving the approximate expression (5) 
for the electrostatic field of a localized system of 
charges at a distant point (r » r' ~ a). 



Then positions of all charges of the system satisfy the following condition 

r'«r. (3.1) 

Using this condition, we can expand the general expression (1.38) for the electrostatic potential $r) of 
the system into the Taylor series in small parameter r' = {r\, r'%, r'3}. For any spatial function of the 
type/(r - r'), the expansion may be presented as 1 

/(r-r>/(r)-yr'i(r) + - V r V_LL( r )_.... (3.2) 

dr j 2!^i (>,(>, 

The two leading terms of this expansion, sufficient for our current purposes, may be rewritten in the 
vector form: 2 

/(r-r')*/(r)-r'-V/(r) + .... (3.3) 

Let us apply this approximate formula to the free-space Green's function (2.204), which weighs the 
charge density contributions in Eq. (1.38). The gradient of such a spherically-symmetric function /(r) = 
Mr is just Wydfldr, so that 



1 See, e.g., MAEq. (2.11b). 

2 The third term (responsible for quadrupole effects), as well as all the following, multipole terms would require a 
tensor (rather then vector) representation. 
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n. 



dr 
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(3.4) 



Plugging this dipole expansion into Eq. (1.38), we get 
1 

Ans n r " r 



- J p(r ') d V ' + 4 • j" P(r ')r 'J V ' 



8/P 



oV'- 



, 



where 2 is the net charge of the system, while the vector 



p = jp(r')rtfV : 



(3.5) 



(3.6) 



Electric 

dipole 

moment 



with magnitude p of the order of Qa, is called its (electric) dipole moment. 



If Q ^ 0, the second term in the right-hand part of Eq. (5) is just a small correction to the first 
one, and in many cases may be ignored. (Remember, Eq. (5) is only valid in the limit rla — » oo). 
However, the net charge of many systems is exactly zero. The most important example is a neutral atom 
or a neutral molecule, in which the negative charge of electrons exactly compensates the positive charge 
of protons in nuclei. For such neural systems, the second (dipole -moment) term, </>d, in Eq. (5) is the 
leading one. Due to its importance, let us rewrite this expression in two other, equivalent forms: 



1 r p _ 1 pcosO _ 1 



pz 



47rs 0 r 



AKS n 



4 ^o [x 2 +y 2 +z 2 



3/2 



(3.7) 



Electric 
dipole's 
potential 



that are more convenient for some applications. Here 0 is the angle between vectors p and r, and in the 
last (Cartesian) presentation, axis z is directed along vector p. Figure 2a shows equipotential surfaces of 
the dipole field (or rather their cross-sections by a plane in which vector p resides). 




(b) 



P A 




Fig. 3.2. Dipole field: (a) equipotential surfaces and (b) electric field lines, for a vertical vector p. 



3 Accordingly, a localized system of charges with 2 = 0, but p ^ 0, is called an (electric) dipole. 
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The simplest example of the dipole (that gave such systems their name) is a system of two equal 
but opposite point charges, +q and -q, with radius-vectors, respectively, r+ and r_: 



For this system, Eq. (6) yields 



p(r) = (+q)S(r - r + ) + {-q)8{r - r_) 



P = (+4)r + + (~q)r_ = q(r + -r_) = qa, 



(3.8) 



(3.9) 



where a is the vector connecting points r. and r+. Note that in this case (and for all systems with Q = 0), 
the dipole moment does not depend on the reference frame origin choice. 

A less trivial example is a conducting sphere of radius R in a uniform external electric field E 0 . 
As a reminder, we have solved this problem in Sec. 2.5(iv) and obtained Eq. (2.176) as a result. The first 
term in the parentheses of that relation describes the external field (2.173), so that the field of the sphere 
itself (meaning the field of its surface charge induced by Eq) is given by the second term: 



<Ps = 



COS0 . 



(3.10) 



Comparing this expression with the second form of Eq. (7), we see that the sphere has an induced dipole 
moment 



p = 4xe 0 E 0 R 



(3.11) 



This is an interesting example of a purely dipole field - in all points outside the sphere (r > R), the field 
has no higher moments. 4 

Returning to the general properties of the dipole field, let us calculate its characteristics. First of 
all, we may use Eq. (7) to calculate the electric field of a dipole: 



1 V 




=- 1 V 


^pcos6^ 


4tis 0 
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V r j 


47TS 0 
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V r J 



(3.12) 



The differentiation is easiest in spherical coordinates, using the following well-known expression for the 
gradient of a scalar function in these coordinates 5 and taking axis z parallel to vector p. From the last 
form of Eq. (12) we immediately get 



Electric 
dipole's 
field 



Ans Q r 



(2n r cos # + 110 sin 6) = 



1 3r(r p)-pr 



Ane n 



(3.13) 



Figure 2b shows the electric field lines given by Eqs. (13). 

Next, let us calculate the potential energy of interaction between a fixed dipole and a external 
electric field, using Eq. (1.54). Assuming that the external field does not change much at distances of the 
order of a (Fig. 1), we may expand the external potential 0 e xt(r) into the Taylor series, just as Eq. (3) 
prescribes, and keep only its two leading terms: 



4 Other examples of dipole fields are given by two more systems discussed in Chapter 2 - see Eqs. (2.215) and 
(2.219). Those systems, however, do have higher-order multipole moments, so that for them, Eq. (7) gives only 
the long-distance approximation. 

5 See, e.g., MA Eq. (10.8) with dld(p= 0. 
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U = \p(r)<f> ext (r)d 3 r » jp(r)[^ ext (0) + r • V^(0)]d 3 r = 2^ xt (o)-p -E ext . (3.14) 

The first term is the potential energy the system would have if it were a point charge. If the net charge Q 

is zero, that term disappears, and the leading contribution is due to the dipole moment: Dipoie s 

(3.15) 



U = -p E e 



energy in 
external 
field 



Note, however, that Eq. (15) is only valid for a fixed dipole, with p independent of E ext . In the opposite 
limit, when the dipole is induced by the field, i.e. p oc E ex t (see Eq. (1 1) as an example), we can repeat 
the discussion that accompanied Fig. 1.6 to show that Eq. (15) acquires an additional factor V%. 

In particular, combining Eqs. (13) and Eq. (15), we may get the following important formula for 
interaction of two independent dipoies 



1 p! -p 2 r 2 -3(r-p,)(r-p 2 ) 1 PuPi, + PuPiy ~ 2 Pi z P 



47T£ Q r 4ae 0 r 

where r is the vector connecting the dipoies, and axis z is directed along this vector. If each moment is 
due to the polarization of the dipole by the electric field of its counterpart: p li2 °c E 2j i cc \lr , this 
expression (which is valid for this case with the additional factor Vz) the potential is always negative and 
proportional to 1/r 6 . Such potential describes, in particular, the long-range, attractive part (the so-called 
London dispersion force) of the interaction between electrically neutral atoms and molecules. 6 

According to Eq. (15), in order to reach the minimum of U, the electric field "tries" to align the 
dipole direction along its own. The quantitative expression of this effect is the torque x exerted by the 
field. The simplest way to calculate it is to sum up all the elementary torques dx = rx<iF ext = 
rxEext(r)>o( r )^ 3r exerted on all elementary charges of the system: 

T = jrxE ext (r)p(r)J 3 r«pxE ext (0), (3.17) 

where at the last transition we have again neglected the spatial dependence of the external field. 

The spatial dependence of E ex t cannot, however, be ignored at the calculation of the total force 
exerted by the field on the dipole (with Q = 0). Indeed, Eq. (15) shows that if the field is constant, the 
dipole energy is the same at all spatial points, and hence the net force is zero. However, if the field has a 
finite gradient, a total force does appear: 

F = -V£/=V(p-E ext ), (3.18) 

where the derivative has to be taken at the dipole 's position (in our notation, at r = 0). If the dipole that 
is being moved in a field retains its magnitude and orientation, then the last formula is equivalent to 7 

F = (P V )E ext . 

(3.19) 

Alternatively, the last expression may be obtained similarly to Eq. (14): 



6 See, e.g., SM Sec. 3.5. 

7 The equivalence may be proved, for example, by using MA Eq. (1 1.6) with f = p = const and g = E ext , taking 
into account that according to the general Eq. (1.28), VxE ext = 0. 
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F = jp(r)E ext (r)J 3 r « j>(r)[E ext (o) + (r • V)E ext ] d 3 r = QV ext (0) + (p • V)E ext . (3.20) 

Finally, let me add a note on the so-called coarse-grain model of the dipole. The dipole 
approximation explored above is asymptotically correct at large distances, r » a. However, for some 
applications (including the forthcoming discussion in Sec. 5 of molecular field effects) it is important to 
have an expression that would be approximately valid everywhere in space, though maybe without exact 
details at r ~ a, and also give the correct result for the space-average of the electric field, 



E^ljEjV, (3.21) 



"2 

where V is a regularly-shaped volume much larger than a , for example a sphere of radius R » a, with 
the dipole at its center. For the field E^ given by Eq. (13), such average is zero. Indeed, let us consider 
Cartesian components of that vector in the coordinate system with axis z directed along vector p. Due to 
the axial symmetry of the field, the averages of components E x and E y evidently vanish. Let us use Eq. 
(13) to spell out the "vertical" component of the field (parallel to the dipole moment vector): 

E z =E d = — l -^(2n r •pcos^-n^-psin6') = — ^(2cos 2 6>-sin 2 o). (3.22) 
p Ans^r Ans Q r 

Integrating this expression over the whole solid angle Q = An, at fixed r, using a convenient variable 
substitution cos 6= we get 

n n +1 

| E z dQ = 2n\ E z sin 6dG = — P — j" (l cos 2 0 - sin 2 #)sin 6dG = j" (3£ 3 - £Jtf£ = 0 . (3.23) 

On the other hand, the exact electric field of an arbitrary charge distribution satisfies the 
following condition, 



|E(r)J 3 r = --P-, (3.24) 

V ^^0 



where the integration is over any sphere containing all the charges. A proof of this formula for the 
general case requires a somewhat cumbersome, though straightforward integration, 8 but in Sec. 4 we 
will see that it is correct at least for one (and very important) particular case. The origin of the difference 
between Eqs. (23) and (24) is illustrated in Fig. 3 on the example of a dipole created by two equal but 
opposite charges - see Eqs. (8)-(9). The zero average of the dipole field (13) does not take into account 
the contribution of the field in the region between the charges (where Eq. (13) is not valid), which is 
directed mostly against the dipole vector (9). 

Thus in order to be used as a reasonable coarse-grain model, Eq. (13) should be modified as 
follows: 



«_=■ 1 



3r(r-p)-pr -%<?(r) 

r 5 3 W 



(3.25) 



8 See, e.g., the end of Sec. 4. 1 of J. Jackson, Classical Electrodynamics, 3 rd ed., Wiley, 1999. 
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Evidently, such modification does not change the field at large distances r » a, i.e. in the region where 
expansion (3) is valid. 



V 




3.2. Dipole media 

Let us generalize equation (7) to the case of several (possibly, many) dipoles p 7 located at 
arbitrary points i)-. Using the linear superposition principle, we get 

H7lb o i r_r / 

If our system (medium) contains many similar dipoles, distributed in space with density n(r), we may 
use the same standard argumentation that has led us from Eq. (1.5) to Eq. (1.8), to rewrite the last sum as 
an integral 



m = -1— f P(r ') • d 3 r' , (3.27) 



4n:s 0 - |r — r ! 

where vector P(r) = n(r)p, called electric polarization has the physical meaning of the net dipole 
moment per unit volume. Note again that since Eq. (26) does not describe that field at distances 
comparable to the dipole size, and hence Eq. (27), and all the following formulas of this section, 
describes the so-called macroscopic electric field, i.e. the dipole field averaged over the microscopic 
(dipole-size) distances. 

Now comes a very impressive mathematical trick. Just as has been done in the previous section 
(just with the appropriate sign change), Eq. (27) may be rewritten in the equivalent form 



^(r) = -J-fp(r')-V' r ^-rdV , (3.28) 



where V means the del operator (in this particular case, the gradient) acting in the "source space" of 
vectors r'. The right-hand part of Eq. (28), applied to any volume V limited by surface S, may be 
integrated by parts in the following way: 9 



9 To prove this (almost evident) formula strictly, it is sufficient to apply the divergence theorem given by MA Eq. 
(12.2), to vector function f = P(r')/|r- r'|, in the "source space" of radius-vectors r'. 
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Aks, 



P n (r'h 



-d 2 r' 



0 s 



(3.29) 



If the surface does not carry an infinitely dense (^-functional) sheet of additional dipoles, or it is 
just very far, the first term in the right-hand part is negligible. Now comparing the second term with the 
basic equation (1.38) for the electric potential, we see that this term may be interpreted as the field of 
certain effective electric charges with density 



Effective 
charge 
density 



p e{ =-v-p. 



(3.30) 



Figure 4 illustrates the physics of this relation for a cartoon model of a multi-dipole system: a 
layer of uniformly-distributed two-point-charge units oriented perpendicular to the layer surface. (In this 
case VP = dP/dx.) One can see that /? ef , defined by Eq. (30), may be interpreted as the density of 
uncompensated surface charges of polarized elementary dipoles. 




A 



\ / 
\ i 

Pef \' 



0 




Fig. 3.4. Spatial distributions of the 
polarization and effective charges in a layer of 
similar elementary dipoles (schematically). 



Next, from Sec. 1.2, we already know that Eq. (1.38) is equivalent to the inhomogeneous 
Maxwell equation (1.27) for the electric field. This is why Eq. (30) implies that if, besides the 
compensated charges of the dipoles, the system also has certain stand-alone charges (not a part of the 
dipoles!) distributed in space with density p(r), the average electric field obeys, instead of Eq. (1.27), 
the following generalized equation 



V-E = — (p + p ef ) = — (p-V-P). 



(3.31) 



It is evidently tempting (and very convenient for applications!) to carry over the dipole-related term of 
this equation over to the left-hand part of Eq. (31), and rewrite the resulting equality as the so-called 

macroscopic Maxwell equation 



Electric 
displacement 



V-D = p 



(3.32) 



where a new vector, called the electric displacement, is defined as 10 



10 Note that the dimensionality of D in SI units is different from that of E. In contrast, in the Gaussian units the 
electric displacement is defined as D = E + 4^P, so that V-D = Anp (the relation p ef = -V-P remains the same as in 
SI units), and the dimensionalities of D and E coincide. Philosophically, this coincidence is a certain handicap, 
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D 



s 0 E + P . 



(3.33) 



The comparison of Eqs. (32) and (1.27) shows that D may be interpreted as the "would-be" 
electric field that would be created by stand-alone charges in the absence of the dipole medium 
polarization. In contrast, E is the actual electric field - though, as was mentioned above, space-averaged 
over a volume much larger that of an elementary dipole. 11 

To complete the general analysis of the multi-dipole systems, let us rewrite the macroscopic 
Maxwell equation (32) in the integral form. Applying the divergence theorem to an arbitrary volume V 
limited by surface S, we get the following macroscopic Gauss law: 



\D n d 2 r = \pd\ = Q, 



(3.34) 



Macroscopic 
Gauss law 



where Q is the total stand-alone charge inside volume V. 



Let me emphasize again that the key Eq. (27), and hence all the following equations of this 
section, only to the macroscopic field, i.e. the electric field averaged over its rapid variations at the 
atomic space scale. Such macroscopic description is valid as soon as we are not concerned with the 
inter-atomic field variations - for whose description the classical physics is inadequate in any case. 



3.3. Linear dielectrics 

The general equations derived above are broadly used to describe any dielectrics - materials with 
bound electric charges (and hence with no dc electric conduction). The polarization properties of these 
materials may be described by the dependence between vectors P and E. In the most materials, in the 
absence of external electric field, the elementary dipoles p either equal zero or have a random 
orientation in space, so that the net dipole moment of each macroscopic volume (still containing many 
such dipoles) equals zero: P = 0. 

Moreover, if the field changes are sufficiently slow, most materials may be characterized by a 
unique dependence of P on E. Then using the Taylor expansion of function P(E), we may argue that in 
relatively low electric fields the function should be well approximated by a linear dependence between 
these two vectors. In an isotropic media, the coefficient of proportionality should be just a scalar. 12 In 
SI units, this constant is defined by the following relation: 



Electric 



(3.35) susceptibility 
definition 



with the dimensionless constant % e called the electric susceptibility. However, it is much more common 
to use, instead of Xe , another parameter, 



(3.36) 



Dielectric 
constant 



because it is frequently convenient to consider Cartesian components of E as a generalized force, and those of D 
as a generalized coordinate (see Sec. 6 below), and it is comforting to have their dimensionality different. 

1 1 Note, however, that such averaging does not include the inner-dipole fields which is (approximately) described 
by the second term of Eq. (25). 

12 In anisotropic materials, such as crystals, a susceptibility tensor may be necessary to give an adequate 
description of the linear relation of vectors P and E. Fortunately, in most important crystals (such as silicon) the 
anisotropy of polarization is small, so that they may be reasonably well characterized by scalar susceptibility. 
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Electric 
permittivity 



which is sometimes called the relative electric permittivity, but much more often, the dielectric 
constant.^ This parameter is very convenient, because combining Eqs. (35) and (36), 

V = (s,.-l)s 0 E. (3.37) 
and then plugging the resulting relation into the general Eq. (33), we get simply 14 

(3.38) 



D = sE, with s = s 0 £ r 



£ 0 d + Z e )- 



where s is called the electric permittivity of the material. Table 1 gives values of the dielectric constant 
for several representative materials. 



Table 3.1. Di electric constants of a few representative (and/or practically important) dielectrics 



Material 


s r 


Air (at ambient conditions) 


1.00054 


Teflon (polytetrafluoroethylene, C„F 2 „) 


2.1 


Silicon dioxide (amorphous) 


3.9 


Glasses (of various compositions) 


3.7-10 


Castor oil 


4.5 


Silicon 


11.7 


Water (at 100°C) 


55.3 


Water (at 20°C) 


80.1 


Barium titanate (BaTi0 3 at 20°C ) 


-1,600 



Molecular 
polarizability 



In order to get some feeling of the physics behind these values, let us consider a very common 
model of a media whose elementary dipoles do not interact, so that in the relation P = np the elementary 
dipole moments p may be calculated independently of each other. This means that in a linear dielectric, 
in which Eq. (35) holds, each induced dipole moment p has to be proportional to the applied field E as 
well. Let us write this dependence in the following traditional form, 



P = 4 ^0«mol E 



where or mo i is called the molecular (or, sometimes, "atomic") polarizability, so that 



P = np = 4ff£ 0 a mol nE 



(3.39) 



(3.40) 



Comparing this relation with Eq. (35), we get % e = <\na mo \n, so that Eq. (36) yields 15 



13 Note that in electrical engineering literature, the dielectric constant is often denoted by letters /cor K. 

14 In Gaussian units, Xe is defined by relation P = j e E, while s is still defined as D = sE. Because of that, s is 
dimensionless and equals (1 + 4;zj c ). Note that {s) Gm ^ lw = (s/so)si = and (j e ) S i = 4^(/ £ ,) Gaussian , sometimes 
creating a confusion with the numerical values of the latter parameter. 
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s r =\ + 4na moX n. (3.41) 

Now let us consider the following simple model of a dielectric: a set of similar conducting 
spheres of radius R, spread apart with small density n « 1/R 3 . At such low density of the spheres, their 
electrostatic interaction is negligible, and we can use Eq. (1 1) for the induced dipole moment of a single 
sphere. Then the polarizability definition (39) yields a mo \ = R 3 , so that %e = 4mR 3 , and 

s r =\ + 4nR 3 n. (3.42) 

Let us use this result for a crude estimate of the dielectric constant of air at the so-called ambient 
conditions, meaning the normal atmospheric pressure, and temperature T = 300 K. At these conditions 
the molecular density n may be, with a few-percent accuracy, found from the equation of state of an 
ideal gas: 16 n * P/k B T * (1.013xl0 5 )/(i-38xl0~ 23 x300) « 2.5xl0 25 m" 3 . The main component of air, 
molecular nitrogen N2, has a van-der-Waals radius 17 of 155 pm = 1.55xl0" 10 m. Using it for R, from our 
crude model we get s r ~ 1.001. Comparing this number with the first line of Table 1, we see that our 
crude model gives surprisingly reasonable results: in order to get the exact experimental value, it is 
sufficient to decrease R by just -25%. 

This result may encourage us to try using Eq. (42) for larger density n, i.e., beyond the range of 
its quantitative applicability. For example, as a crude model for solid and liquids let us assume that 
spheres form a simple cubic lattice with period a = 2R (i.e., the neighboring spheres almost touch). With 
this n = \la = l/SR 3 , Eq. (33) yields s r = 1 + 4^/8 « 2.5. Due to the crude nature of this estimate, we 
may conclude that it provides a reasonable explanation for the values of s r , listed in first few lines of 
Table 1. Still, it is clear that such model cannot even approximately describe dielectric properties of 
either water or barium titanate (and similar materials), as well as their strong temperature dependence. 
Such high values may be explained by the molecular field effect: each elementary dipole is polarized not 
only by the external field (as in our current toy model), but by the field of neighboring dipoles as well. 

Before analyzing this effect (in the next section), let us review how are the most important 
results of electrostatics modified by a uniform linear dielectric medium that obeys Eq. (38) with a space- 
independent dielectric constant s r . The simplest problem of this kind is a set of free charges of density 
p(r), inserted into the medium. For this case, we can combine Eqs. (32) and (38) to write 

V-E = ^, i.e. VV = --- (3.43) 

s s 

For charges in vacuum, we had similar equations (1.27) and (1.41), but with a different constant, so = 
si s r . Hence all the results discussed in Chapter 1 are valid, with both E and ^ reduced by the factor of s r . 
Thus, the most straightforward result of the induced polarization of a dielectric media is the electric field 
reduction. This is a very important effect, especially taken into account the very high values of s r in 
such dielectrics as water - see Table 1 . Indeed, this is the reduction of the attraction between positive 
and negative ions (called, respectively, cations and anions) in water that enables their substantial 



15 Note that for all materials listed in Table 1, s r > 1, i.e. a mol > 0. Actually, this is true for all stable dielectrics. 
Let me postpone a discussion of this fact until Sec. 5.5 where I will compare physical mechanisms of the electric 
and magnetic polarization. 

16 See, e.g., SM Sees. 1.4 and 3.1. 

17 Such radius is defined by the requirement that the volume of the corresponding sphere, used in the van-der- 
Waals equation (see, e. g., SM Sec. 4.1) gives the best fit to the experimental equation of state n = n(P, T). 



Chapter 3 



Page 10 of 24 



Essential Graduate Physics 



EM: Classical Electrodynamics 



dissociation and hence almost all biochemical reactions, which are the basis of biological cell functions - 
and hence of the life itself. 

Now, what if the electric field in a uniform dielectric is induced by charges located on 
conductors - with potentials rather than charge density fixed? Then, with the substitution of the 
electrostatic potential definition Es-V^, Eq. (43) in the space between the conductors is reduced to the 
Laplace equation, and the boundary problem remains exactly the same as formulated in Chapter 2 - see 
Eqs. (2.35). Hence the potential distribution $>) is related to the conductor potential in exactly the same 
way as in vacuum (see, e.g., any problem discussed in Chapter 2), without any effect of the medium 
polarization. However, in order to find, from that distribution, the density <j of charges on conductor 
surfaces, we need to use the macroscopic Gauss law (34). Applying this equation to a pillbox-shaped 
volume on the conductor surface, we get the following relation, 

a = D n =sE n =-s d ^, (3.44) 
on 

which differs from Eq. (2.3) only by the replacement s 0 — > s = £ r So. Hence the charge density, calculated 
for the vacuum case, should be increased by the factor of s r - that's it. In particular, this means that all 
the capacitances that had been calculated in vacuum, should be increased by that factor. For example, 
for planar capacitor filled with linear dielectric s r , we get the well-known formula 

C m of a 
planar 
capacitor 

(As a reminder, this increase of C m by s r has been already used in Sec. 2.2 for capacitance estimates.) 

Now let us discuss more complex situations in which the dielectric medium is not uniform, for 
example when it contains a boundary separating two regions filled by different uniform dielectrics. (The 
analysis is clearly applicable to a dielectric/vacuum boundary as well, with one of the dielectric 
constants set to 1.) For that, let us apply the macroscopic Gauss law (34) to a pillbox formed at the 
interface between two dielectrics, with no surface charges - see the solid lines in Fig. 5. 



c = e r e 0 A = sA 
d d 



(3.45) 




Fig. 3.5. Deriving boundary conditions on the 
interface between two dielectrics: a Gauss 
pillbox and a circulation contour, n and x are 
the unit vectors which are, respectively, 
normal and tangential to the interface. 



This immediately gives (D„)i = (AO2, so that Eq. (38) yields 



Boundary 
condition 
for E n 



(sE n \ ={sE n ) 2 , i.e. s x ^ = s ' 



dn 



dn 



(3.46) 
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Now, what about the tangential component (E x ) of the electric field? In dielectrics, static electric 
field is still potential, hence we can still use Eq. (1.28). Integrating this equation along to a narrow 
contour stretched along the interface (see the dashed line in Fig. 5), we get 



(E\=(E r \, i.e. 



dfa d(/) 2 



dz dz 



(3.47) 



Boundary 
condition 
for E r 



Note that this condition is compatible with (and may be derived follows from) the continuity of the 
electrostatic potential itself, <j)\ = fc, at each point of the interface. That relation may be derived from the 
electric field definition as the gradient of 0 - see Eq. (1.33). Indeed, if the potential leaped at the border, 
the electric field would be infinite. 

Let us apply the boundary conditions (46)-(47), for example, to two thin (t « d) vacuum slits 
cut in a uniform dielectric with an initially uniform 18 electric field Eo (Fig. 6). In both cases, a slit with t 
— » 0 cannot modify the field distribution outside it substantially. 



E n 



.D„ = £\.£WE 



(a) 



m 

E=:E, 



i> = s 0 E:= D (V /:<<;,. 



Fig. 3.6. Fields inside narrow 
slits cut in a linear dielectric. 



For slit (a), normal to the applied field, we may apply Eq. (46) to the "major" (broad) interfaces, 
shown horizontal in Fig. 6, we see that D should be continuous. But according to Eq. (33), this means 
that inside the gap (i. e. in the vacuum, with P = 0) the electric field equals D/so. This field, and hence 
D, may be measured, showing that the electric displacement is not a purely mathematical construct. 
Superficially, this result violates the boundary condition (47) on the vertical ("minor") interfaces of the 
slit. Note, however, that the electric field within the gap is s r times higher than in the dielectric outside 
it. Hence the slit deforms the equipotential surfaces around it to concentrate the field inside itself. The 
curving of the surfaces near the minor interfaces takes care of the fulfillment of Eq. (47) at the minor 
interfaces. 

On the contrary, for slit (b) parallel to the applied field, we may apply Eq. (47) to the major 
(now, vertical) interfaces of the slit, to see that it is electric field E that is continuous now, while the 
electric displacement D = £frE inside the gap is a factor of s r lower than its value in the dielectric. (Any 
perturbation of the field uniformity, caused by the compliance with Eq. (46) at the minor interfaces, is 
settled at distances ~ t from these interfaces.) 



18 Actually, selecting the slit size d much less that the characteristic scale of the field change, we can apply the 
following arguments to any external field distribution. 
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For problems with piecewise-constant s but more complex geometries we may need to apply the 
methods studied in Chapter 2. As in vacuum, in the simplest cases we can select such a set of orthogonal 
coordinates that the electrostatic potential depends on just one of them. Consider, for example, two types 
of plane capacitor filling with two different dielectrics - see Fig. 7. 

In case (a), voltage V between the electrodes is the same for each part of the capacitor, and at 
least far from the dielectric interface, the electric field is vertical, uniform, and similar (E = VI d). Hence 
the boundary condition (47) is satisfied even if such a distribution is valid near the surface as well, i.e. at 
any point of the system. The only effect of different values of s in the two parts is that the electric 
displacement D = sE and hence electrodes' surface charge density <j= D are different in the two parts. 
Thus we can calculate the electrode charges Q\^_ of the two parts independently, in each case using Eq. 
(44), and then add up the results to get the total capacitance 

C m = &^ = Us x A x+ s 2 A 2 ). (3.48) 
V d 

Note that this formula may be interpreted as the total capacitance of two separate capacitors connected 
(by conducting wires) in parallel. This is natural, because we may cut the system along the dielectric 
interface, without any effect on the fields in either part, and then connect the corresponding electrodes 
by external wires, again without any effect on the system - besides very close to capacitor's edges. 




Case (b) may be analyzed by applying Eq. (34) to a Gaussian pillbox with the lower lid inside 
the (for example) bottom electrode, and the top lid in any of the layers. From this we see that D 
anywhere inside the system should be equal to the surface charge density a of the lower electrode, i.e. 
constant. Hence, in the top dielectric layer the electric field is constant: E\ = DJs\ = o/s\, while in 
bottom layer, similarly, £2 = D2/S2 = gIsi- Integrating E across the whole capacitor, we get 



d, +d-, 



V = J E(z)dz = E l d l + E 2 d 2 = 



d l d 



+ - 



<7, 



(3.49) 



''2 J 



so that the mutual conductance per unit area 

r 

A 



<j 
V 



-!- + ■ 



(3.50) 



Note that this result is equivalent to the total capacitance of a series connection of two plane 
capacitors based on each of the layers. This is natural, because we could insert an uncharged thin 
conducting sheet (rather than a cut as in the previous case) at the layer interface, which is an 
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equipotential surface, without changing the field distribution in the system. Then we could thicken the 
conducting sheet as much as we like (turning it into a wire), also without changing the fields and hence 
the capacitance. 

In order to warm up for more complex problems, let us see how the last problem could be solved 
using the Laplace equation approach. Due to the symmetry of the system, the electrostatic potential in 
each layer may only depend on one (in Fig. 7b, vertical) coordinate z, so that the Laplace equation in 

2 2 

each uniform part of the system is reduced to d (j)ld z = 0. Hence in each layer the electrostatic potential 
changes linearly, though possibly with different coefficients: <j>\ = cnz + c\ 2 , and ^ = c 2 \z + c 22 . 
Selecting the electrode potentials as </> (0) = 0 and <j> {d\+ d 2 ) = V, from those boundary conditions we get 
cn = 0, c 2 \{d\+d 2 ) + c 2 2 = V, so that we need two more equations to find all four coefficients Cjy. These 
additional equations come from the conditions of continuity of the potential {c\\d\ + cu = c 2 \d\ + c 22 ) 
and displacement (s\C\\ = s 2 c 2 \) at the interface z = d\. Solving these equations, we can readily find the 
electric field and displacement in both layers, then the surface charge densities 



o-(O) = D 



z=0 " e \ 



d<j) x 



dz 



z=0' 



cr(d x + d 2 ) = D 



z=d { +d 2 ~ 6 -2 



d(j) 2 



dz 



z=d l +d 2 



(3.51) 



(which in this case are equal and opposite) and finally the capacitance per unit area, with (of course) the 
same result (50). 

Let us apply the same approach to a more complex problem, shown in Fig. 8a, for which the 
Laplace equation is not one-dimensional, and hence invites the variable separation method discussed in 
Sec. 2.5. From that discussion we already know, in particular, the general solution (2.172) of the 
Laplace equation outside of the sphere. To satisfy the uniform-field condition at r — > qo, it reduces to 



0 r>R =-£ 0 rcos# + X-^( cos #) 



i=i 



Inside the sphere we can only use the radial functions that are finite at r — > 0: 

CO 

<t>r<R =X a /^/( C0S ^)- 



(3.52) 



(3.53) 



i=i 



Now, writing the boundary conditions (46) and (47) at r = R, we see that for all coefficients ai and bi 
with / > 2 we (just like for the conducting sphere in vacuum) get homogeneous equations that have only 
trivial solutions. Hence, all these terms may be dropped, while for the only surviving angular harmonic, 
proportional to ^(cos^ = cos0, Eqs. (46)-(47) yield two equations: 



2b x b x 
-E 0 - — -s r a x , -E 0 R + — -a l R. 
K K 

Solving this simple system for a\ and b\, we get the final solution of the problem: 



<t>r>R = £ c 



-r + 



1 R 



3^ 



s+2r 2 



COS0, 



0r<R = ~ E 0 



s+2 



rcos6> . 



(3.54) 



(3.55) 



Figure 8b shows the equipotential surfaces given by this solution, for a particular value off the 
dielectric constant s r . Note that, just like for a conducting sphere, at r > R the dielectric sphere produces 
(on the top of the uniform external field) a purely dipole field, with p = AnR E 0 (e r - !)/(£> + 2) - an 
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evident generalization of Eq. (11), to which our result tends at a r — > qo. By the way, this property is 
common: from the point of view of their electrostatic (but not transport!) properties, conductors may be 
adequately described as dielectrics with £> — > oo. 

Another remarkable feature of Eqs. (55) is that the electric field inside the sphere is uniform 19 
with ^-independent values 

E = ^— E 0 , D = e r e 0 E = e 0 ^-E 0 , P = D - s 0 E = 3s 0 ^-E 0 . (3.56) 

£ r +2 £ r +2 £ r +2 

In the limit s r — > 1 (the "vacuum sphere", i.e. no sphere at all), the electric field inside the sphere 
naturally tends to the external one, and its polarization disappears. In the opposite limit and £ r —>cc the 
electric field inside the sphere vanishes, and the field outside the sphere approaches that we have found 
for the conducting sphere - see Eq. (2.176). 




To complete the discussion of this example, note a very curious result: the field E se if, created by 
the dielectric sphere inside itself, is related to its polarization vector by a simple equation independent of 
either the dielectric constant or sphere's size: 

E self = E-E 0 = -^E 0 = - J-P , (3.57) 
s r + 2 3£ 0 

where factor 3 stems sphere's dimensionality. (For a round cylinder in a normal external field, the 
similar relation is valid, but with factor 2.) This equality is just the particular manifestation of the 
general relation (24). Indeed, if summed over all N = nV similar dipoles p, distributed inside the sphere 
with constant density n (so that the polarization vector P = Hp is constant), Eq. (24) yields 



This is true for any ellipsoid, at arbitrary external field orientation. 
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|E self (r)J 3 r = --?-y, (3.58) 
v -* £ o 

so that after division by V, and taking into account the field uniformity in our particular case, it 
coincides with Eq. (5 7). 20 We will use these results in the following section to discuss the molecular 
field effect. 

Before doing that, let me briefly revisit the method of charge images that was discussed in Sec. 
2.6, to find its new features pertaining to dielectrics. As the simplest example, consider a point charge 
near a dielectric half-space - see Fig. 9 (cf. Fig. 2.24).. 




Fig. 3.9. Charge images for a dielectric half-space. 



The Laplace equations in the upper half-space z > 0 (besides the charge point p = 0, z = d) may 
still be satisfied using a single image charge q at point p = 0, z = - d, but now q' may differ from (-q). 
In addition, in contrast to the conducting plane case, we should also find the field inside the dielectric (z 
< 0). This field cannot be contributed by the image charge, because it would provide a potential 
divergence at its location. Thus, in that half-space we should try to use the real point source only, but 
maybe with a re-normalized charge q" rather than the genuine charge q - see Fig. 9. As a result, we may 
look for the potential distribution in the form 



4nS n 



{p 2 Hz-d)T' + {p 2 +(z + d)'j' 



at this stage with unknown q' and q". 



for z > 0, 
for z < 0, 



(3.59) 



20 The reader may wonder how have we managed to proof Eq. (24), at least for this particular case, using only the 
relations based on the dipole approximation (7) for the field, which does not cover the inter-dipole fields 
responsible for Eq. (24) - see Fig. 3 and its discussion. The reason is that according to Eq. (30), the additional 
field E se if inside the sphere may be considered as been created by effective charges, of density p ef , distributed on 
sphere's surface. For these charges, field E e f is internal, similar to the field between two charges, shown in Fig. 3. 
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Plugging this solution into the boundary conditions boundary conditions (46) and (47) at z = 0 
(with d/dn = d/dz), we see that they are indeed satisfied (so that Eqs. (59) express the unique solution of 
the boundary problem) if the effective charges q' and q" obey the following relations: 

q-q' = q", q + q' = ^. (3.60) 

Solving this simple system of linear equations, we get 

q' = -—,q, q" = —,q- (3-61) 
s r +\ s r +\ 

If s r —> 1, then q' — > 0, and q" — > q - both facts very natural, because in this limit (no 
polarization!) we have to recover the unperturbed field of the initial point charge in both semi-spaces. In 
the opposite limit a r — > <x> (which, according to our discussion of the last problem, should correspond to 
a conducting plane), q'—> q (repeating the result we have discussed in very much detail in Sec. 2.6) and 
q" — > 2q. The last result may look a bit counter-intuitive, but note that factor s r has been already 
incorporated in the denominator of the bottom line of Eq. (59), so that the field in the dielectric tends to 
zero in this limit, as it should. 

Finally, following the logic of Chapter 2, at this point it would be appropriate to discuss the 
Green's function method. However, due to the time/space restrictions, I will skip this discussion, 
especially because the all the method's philosophy remains absolutely the same as for the vacuum case, 
so that the generalization to the case of dielectrics is straightforward. 



3.5. Molecular field effects 

In 1850, O.-F. Mossotti and (probably, independently, but almost 30 years later!) R. Clausius 
made an interesting experimental observation known now, rather unfairly, as the Clausius-Mossotti 
relation: if density n of molecules in a chemical compound may be changed without changing its 
molecular structure, then the following ratio, 

^4, (3-62) 
s r +2 

is approximately proportional to n. For s r — > 1, i.e., n — > 0, there is no surprise here: according to Eq. 
(41), for independent molecular dipoles s r - 1 = 4^« mo in <x cc n. However, at larger density n, the 
effective field E e f, acting on each dipole, includes not only the external field E 0 , but also a substantial 
"molecular field" E mo i of the surrounding dipoles: 

E ef =E 0 +E mol (0), (3.63) 

where the position of the particular dipole we are discussing is taken for r = 0. Let us calculate E mo i(0), 
using a very simple model: a regular cubic lattice of identical dipoles (Fig. 10). In a Cartesian 
coordinate system with axes directed along the lattice vectors, coordinates of the dipoles are 

Xju=aj, y jkl =ak, z jkl =al, (3.64) 
where j, k, and / are the integers numbering the dipoles. 
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^ Fig. 3.10. Cubic lattice of similar dipoles. 



Now we can use the last form of Eq. (13), and the linear superposition principle, to calculate one 
of the Cartesian components (say, along axis x) of the molecular field induced by all other dipoles of the 
lattice: 



J mok 



(o)= 



1 



3j(jp x + kp y +lp z )-p x U 2 +k z +l z ) 



4ns j y k,i=-«> 



2x5/2 



(3.65) 



with excluded term j = k = I = 0 is excluded. The sums of all cross-terms, proportional to jk and jl, 
vanish due to system symmetry, so that Eq. (65) reduces to 



y mok 



(o) = 



1 



f [3j 2 -(j 2 +k 2 +l 2 )} 

2—1 t -1 . 7 2 . j2x5/2 Pi 



4m:7 u tr , (j 2 +k 2 +l 2 ) 5 ' 2 



Since all the sums participating in this expression are equal, 

: 2 +oo l2 



J 



= z 



= X 



j,kj=-oo{j 2 +k 2 +l 2 ) 5 ' 2 j^-^H 1 +k z +l l Y 11 j^^ti 1 +k z +/ z ) 



2x5/2 



(3.66) 



(3.67) 



we get E m0 ]jp) = 0. Due to the system symmetry, the same result is valid for all other components of the 
dipole field. Hence, E mo i(0) = 0, and (due to the equivalence of all the dipoles of the system), the 
molecular field vanishes at the location of each dipole, so that Eq. (3.63) is reduced to E e f = E 0 . 

In order to relate the external field E 0 and the average dipole 21 field E in the medium, we may 
use Eq. (56) for a uniform, macroscopic sphere 22 with a radius much larger then the inter-dipole distance 
a, so that our assumption of infinite limits of the rapidly converging sum (65) is not substantially 
affected: 



E = 



-E n =■ 



E 



ef ■ 



s r +2 s r +2 

Now we may plug this relation into the general formula (37) for linear dielectrics: 



(3.68) 



21 This qualifier is important: E is the long-range (dipole field) average participating in the macroscopic Maxwell 
equations, rather than the exact average that would include the inner-dipole fields, for which Eq. (24) would be 
valid. 

22 This geometry, due to its isotropy, most fairly represents the relation between E and E 0 . 
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P = (*,.- l> 0 E = fc^ 0 E 



, *• (3-69) 

This "macroscopic" relation has to give the same result as the "microscopic" Eq. (40) - with the 
replacement E — > E e f which reflects the fact that in the general case each dipole is polarized by the 
effective field (63) rather than the average field E: 

P = 4^ 0 a mol nE ef . (3.70) 
The comparison yields the so-called Lorentz-Lorenz formula, 23 



Lorentz- 
Lorenz 
formula 




(3.71) 

that complies with the Clausius-Mossotti observation, provided that the molecular polarizability c^oi is 
independent of density. (This is a good approximation at least for weak "molecular" bonding.) 

It is somewhat surprising how many dielectric materials obey Eq. (71) rather well, because of its 
approximate nature. Indeed, its derivation is based on the assumption of a specific crystal lattice and, 
more importantly, that the molecules are localized exactly in the crystal lattice nodes, and the field of 
each molecule may be expressed by the dipole approximation. In reality, atom's electrons, which 
participate in the dipole moment formation, are spread in space due to quantum-mechanical uncertainty 
on a scale that may be comparable with distances between the molecules. 

Solving Eq. (71) for the dielectric constant, we get 

^ = l + to mol n/3 (3?2) 
l-4^a mol n/3 

If the dipole density is low, a mo \n « 1, we get our old result (41) corresponding to independent dipoles, 
and hence to E e f = E. However, at high dipole density and/or polarizability, the effective field acting on 
the each dipole, 

E ef =^^E = , (3.73) 

may be substantially larger than the average field E, due to the molecular field contribution. Note s r , the 
E e f/E ratio, and hence the electric susceptibility 

Xe =*- = e r -l= , (3.74) 

s 0 E \-Ana mol nl2> 

all diverge when the density-polarizability product approaches the critical value a mo \n = 3/4tt. 

This is essentially a rudimentary 24 description of the transition from linear dielectrics to the so- 
called ferroelectrics with self-sustained {spontaneous) polarization even in the absence of external 



23 It was derived by in 1869 by L. Lorenz and then (in 1878) independently by H. Lorentz. Actually, they 
discussed optical frequencies at which s r should be understood as the square of the refraction coefficient at the 
wave frequency (see Chapter 7), but since the optical wavelengths ~ 10" 4 m are much longer than interatomic 
distances a ~ 10~ 9 m, the derivation remains absolutely the same in electrostatics. 
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electric field. These materials are typically recognized by the hysteretic behavior of their polarization as 
a function of applied electric field - see, for example, Fig. 1 1 . 

Ferroelectric materials are being actively explored as the active materials for nonvolatile 
random- access memories (dubbed either FRAM or FeRAM). 25 In cells of this memory, binary 
information is stored in the form of one of two possible directions of spontaneous polarization at E = 0 - 
see, e.g., Fig. 11, and is read out by the effect of the average electric field on a nearby semiconductor 
field-effect transistor. Unfortunately, most materials suitable for fabrication of ferroelectric thin films 
are rather complex and incompatible with standard processes of microelectronics. In addition, the time 
of spontaneous depolarization of ferroelectric thin films is typically well below than 10 years - the 
industrial standard for data retention in nonvolatile memories, and this time may be decreased even 
more by "fatigue" from repeated polarization recycling. Due to these reasons, industrial production of 
FRAM is currently just a tiny, few-percent fraction of the nonvolatile memory market (which is 
currently dominated by floating-gate memories - see Sec. 4.2). 



Fig. 3.11. Ferroelectric hysteretic loops: (a) for various material types 
(schematically), and (b) for several amplitudes of the applied ac electric field. 
(Panel b, showing recent (2013) experimental results by S.-W. Jung et al. for an 
inkjet-printed layer of organic semiconductor PC12TV12T, is adapted from 
http://etrij.etri.re.ln , /etrii/iournal/article/article.do?volume=35&issue=4&page=734 .) 



24 Any quantitative description of this transition should involve an account of thermal fluctuations of the 
molecular dipoles, which reduce the dipole-dipole ordering and hence suppress the transition to the ferroelectric 
phase until temperature has been lowered to a certain Curie temperature T c - named after P. Curie (1859-1906). 
Right above that temperature, the dielectric remains linear, but has a high, temperature-dependent dielectric 
constant that diverges at T — > T c . Such materials are frequently called paraelectric, and the paraelectric-to- 
ferroelectric transition at T c in crystals is a typical example of a continuous (or "second-order") phase transition - 
see, e.g., SM Sec. 4.4. (As will be discussed in Sec. 5.5 below, some magnetic materials exhibit a very similar 
phase transition between their ferromagnetic and paramagnetic phases.) Moreover, in non- crystalline materials, 
such as bulk ceramics and thin films, the ferroelectric behavior is further complicated by different, field- 
dependent directions of polarization P in individual "domains" of the sample, making the average hysteresis 
more smooth (Fig. 11a) and dependent on sample's polarization history - for example the amplitude of the 
applied ac electric field (Fig. lib). 

25 See, e.g., J. F. Scott, Ferroelectric Memories, Springer, 2000. 
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Other polarization effects can also be met, possible, e.g., antiferroelectricity or helielectricity . 
Unfortunately, we will not have time for a discussion of these exotic phenomena in this course; 26 the 
main reason I am mentioning them is to emphasize again that the "material relation" P = P(E) is by no 
means exact or fundamental, though most material, in practicable fields, behave as linear dielectrics. 

3.5. Energy of electric field in a dielectric 

In Chapter 1, we have obtained two key results for the electrostatic energy: Eq. (1.54) for a 
charge interaction with an independent ("external") field, and a similarly structured formula (1.62), but 
with an additional factor Vi, for the field is produced by the charges under consideration. Both relations 
could be merged and rewritten in a "local" form involving energy density u - see Eq. (1.67). These 
equations are of course always valid for dielectrics as well if the charge density includes all charges 
(including those bound into dipoles), but it is convenient to recast them unto a form depending on 
density p(r) of only "stand-alone" charges. 

If a field is created only by stand-alone charges under consideration, and is proportional to p(r) 
(requiring that we deal with a linear dielectric!), we can repeat all the argumentation of the beginning of 
Sec. 1.3, and again arrive at Eq. (1.62), provided that ^is calculated correctly, i.e., with a due account of 
the dielectric. Now we can recast this result in terms of fields - essentially as this was done in Eqs. 
(1.64)-(1.66), but now making a clear difference between the electric field E (that still equals -V0) and 
the electric displacement field D that obeys the macroscopic Maxwell equation (32). Plugging p(r), 
expressed from that equation, into Eq. (1.62), we get 

f/ = ij(V-D)^ 3 r. (3.75) 

Using the fact 27 that for any differentiable functions 0 and D, 

(V-D)0 = V-(0D)-(V^)-D, (3.76) 

we may rewrite Eq. (75) as 

[/=|jv^D)j 3 r-i|(V^)-DJ 3 r. (3.77) 

The divergence theorem, applied to first term, reduces it to a surface integral of <pD n . (As a reminder, in 
Eq. (1.65) the integral was of ^V^)„ « <pE n .) If the surface of the volume we consider is sufficiently far, 
this surface integral vanishes. On the other hand, the gradient in the second term of Eq. (77) is just 
(minus) field E, so that it gives 

U = — [e -D d 3 r = — \E(r) ■ e(r)E(r) d 3 r = ^ \s r (r)E 2 (r) d 3 r. (3.78) 



26 For a detailed coverage of ferroelectrics, I can recommend an encyclopedic monograph by M. Lines and A. 
Glass, Principles and Applications of Ferroelectrics and Related Materials, Oxford U. Press, 2001, and a 
collection of recent developments reviews by K. M. Rabe, C. H. Ahn, and J.-M. Triscone (eds.), Physics of 
Ferroelectrics: A Modern Perspective, Springer, 2010. 

27 See, e.g., MA Eq. (11.4a). 
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This expression is a natural generalization of Eq. (1.67) and shows that we can, like we did in vacuum, 
present the electrostatic energy in a local form 28 



U=$u(r)d 3 r, u=-E-D = -E 2 = 



_2£_ 



(3.79) 



SU = jV(r) Sp(r)d 3 r , 



(3.80) 



where symbol 8 means a small variation of the function - e.g., its change in time, sufficiently slow to 
ignore the relativistic and magnetic-field effects. Applying such variation to Eq. (32), and plugging the 
resulting Sp = V- 3D into Eq. (80), we get 



SU = j(V-SD)0d 3 r. 



(3.81) 



Su = E • SD . 



(3.82) 



f = \g(r)d 3 r, g(r)= M (r)-E e 



D 



(3.84) 



Field 
energy in 
a linear 
dielectric 



Again, this expression is not valid for nonlinear dielectrics, because our starting point, Eq. 
(1.62), is only valid if <fi is proportional to p. In order to make our calculation more general, we should 
intercept our calculations in Sec. 1.3 at an earlier stage, at which we have not yet used this 
proportionality. For example, Eq. (1.54) may be rewritten, in the continuous limit, as 



(Note that in contrast to Eq. (75), this expression does not have factor Vi.) Now repeating the same 
calculations as in the linear case, for the energy density variation we get a remarkably simple (and 
general!) expression, 



General 

energy 

variation 



This is as far as we can go for the general dependence D(E). If the dependence is linear and 
isotropic, as in Eq. (38), then SD = sSE and 

Su = sE-SE (3.83) 

Integration of this expression over variations, from zero field to a certain final distribution E(r), brings 
us back to Eq. (79). 

Another important role of Eq. (82) is that it shows that Cartesian coordinates of E may be 
interpreted as generalized forces, and those of D as generalized coordinates (per unit volume). 29 This 
allows one to form the proper Gibbs potential energy 3,0 of a system inside some volume V, placed in an 
external electric field E ex t: 



Gibbs 

potential 

energy 



28 Again, in Gaussian units this expression should be divided by An. 

29 This is the point where the SI units, prescribing fields E and D different dimensionalities, are more revealing 
than the Gaussian units. 

30 See, e.g., CM Sec. 1.5. Note that as Eq. (84) clearly illustrates, once again, that the difference between potential 
energies and U, usually discussed in courses of statistical physics and/or thermodynamics as the difference 
between the Gibbs and Helmholtz free energies (see, e.g., SM 1.6), is important regardless of statistics or thermal 
motion. 
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As an analytic mechanics reminder, if a generalized external force (in our case, E ext ) is fixed, the 
stable equilibrium of the system corresponds to the minimum of fy, rather than of the potential energy U 
as such - in our case, that of the field: 

[/=Ja(r)dV, u(r) = ^E-SD. (3.85) 

V 

In order to illustrate this important point, let us return to the simple case of a system with linear 
dielectric(s), in which SD oc SE <x cSE ex t, so that Eq. (85) may be explicitly integrated over the external 
field variation, to reproduce the second of Eqs. (79): 

w(r) = ^E-D. (3.86) 

In this case, Eq. (84) yields 

g(r) = Ie • D - E ext • D = | E 2 - sE • E ext = |(E - E ext ) 2 + const , (3.87) 

where the constant may depend on the external field, but not on the resulting field distribution. 

As a sanity check, let us apply this result to a volume V well inside a long dielectric cylinder placed into 
a uniform external field E ex t parallel to cylinder's axis. (Such orientation is important to ignore the 
geometric effects discussed in Sec. 3 - see, e.g., Fig. 6 and its discussion.) Then E has to be uniform in 
the dominating part of the cylinder, so that Eq. (84) may be explicitly integrated over the volume, 
giving 

^ = |(E-E ext ) 2 y + const. (3.88) 

The minimum of this function is achieved at the evidently correct result E = E ex t - in contrast to the 
unphysical result E = 0 (meaning electric field's expulsion from the volume) that we would get 
minimizing U. 



3.6. Exercise problems 

3.1 . An electric dipole is located above an infinite conducting 
plane (see Fig. on the right). Calculate: 

(i) the distribution of the induced charge in the conductor, 

(ii) the dipole-to-plane interaction energy, and 
(ii) the force and the torque acting on the dipole. 



0 




3.2 . Experimental plots in Fig. 11 show that the polarization of EuMn 2 05, a typical 
ferroelectric/paraelectrics material, becomes almost linear at 50 K. Use the plot to calculate (with an 
accuracy better than 10%) its dielectric constant s r at this temperature. 



Chapter 3 



Page 23 of 24 



Essential Graduate Physics 



EM: Classical Electrodynamics 



3.3 . A long, round cylinder is made of a ferroelectric material with fixed, constant polarization P 
perpendicular to cylinder's axis. Calculate the distribution of electric field both inside and outside the 
cylinder. 



3.4 . A fixed dipole is placed in the center of a spherical cavity of 
radius R inside a uniform dielectric (see Fig. on the right). Find the electric 
field distribution in the system (both for r < R and r > R). 



IP 



3.5 . A uniform electric field Eo has been created (by external sources) 

inside a uniform linear dielectric. Find the change of the electric field, created 
by cutting out a cavity in the shape of a round cylinder of radius R, with the 
axis perpendicular to the external field - see Fig. on the right. 




> ► 



3.6 . A plane capacitor, with zero voltage between its 
conducting plates (as may be fixed, e.g., with an external 
wire - see Fig. on the right), is partly filled with a material 
with spontaneous, constant polarization Po. 31 Find the 
distributions of the electric field, electric displacement, and 
the surface charge density of each plate. 



d 4 



t 




In electrical engineering, such materials (typically, synthetic polymers) are frequently called electrets. 
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Chapter 4. DC Currents 

In this chapter I discuss the laws governing the distribution of constant ("dc") currents inside conducing 
media, with a focus on the linear ("Ohmic") conductivity. In most cases, the partial differential equation 
governing the distribution may be reduced to the same Laplace and Poisson equations whose solution 
methods have been discussed in detail in Chapter 2. Due to this fact, this chapter is rather short. 



4. 1 . Continuity equation and the Kirchhoff laws 

Until this point, our discussion of conductors has been limited to the cases when they are 
separated with insulators (meaning either vacuum or dielectric media) preventing any continuous 
motion of charges from one conductor to another, even if there is a voltage difference (and hence 
electric field) between them - see Fig. la. 




Fig. 4.1. Two oppositely charged conductors: (a) at the electrostatic situation, (b) at charge relaxation 
through an additional narrow conductor ("wire"), and (c) a system sustaining dc current in the wire. 



Now let us connect two conductors galvanically, say with a wire - a thin, elongated conductor 
(Fig. lb). Then the electric field causes the motion of charges in the wire - from a conductor with a 
higher electrostatic potential toward that with a lower potential, until the potentials equilibrate. Such 
process is called charge relaxation. The main equation governing this process may be obtained from the 
experimental fact (already mentioned in Sec. 1.1) that electric charges cannot appear or disappear 
(though opposite charges may recombine with the conservation of the net charge.) As a result the 
change of charge Q in one conductor may change only due to the current / through the wire: 1 




(4.1) 



1 Just as a (hopefully, unnecessary :-) reminder, in the SI units the current is measured in amperes (A). In the legal 
metrology, the ampere (rather than the coulomb, which is defined as 1C = 1A x Is) is a primary unit. I will 
mention its formal definition in the next chapter. In the Gaussian units, Eq. (1) remains the same, so that the 
current' s unit is the so-called statampere - defined as statcoulomb per second. 
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Let us express this law in a differential form, introducing the notion of current density vector 
j(r). This vector may be defined via the following relation for current dl crossing an elementary area dA 
(Fig. 2) 

dl = jdA cos 0 = (j cos 6)dA = j n dA , (4.2) 

where 6 is the angle between the normal to the surface and the carrier motion direction (which is taken 
for the direction of vector j). 

dAcos 0 



Fig. 4.2. Current density vector. 




With that definition, Eq. (1) may be re- written as 

±J p d 3 r = -§j n d 2 r, 

til T/ Q 



(4.3) 



where V is an arbitrary stationary volume limited by the closed surface S. Applying to this volume the 
same divergence theorem as was repeatedly used in previous chapters, we get 



— + V-J 

dt 



d^r 



0. 



Since volume V if arbitrary, this equation may be true only if 



dp 
Yt 



+ V-j = 0. 



(4.4) 



(4.5) 



Continuity 
equation 



This is the fundamental continuity equation - which is true even for the time-dependent phenomena. 2 

The charge relaxation is of course a dynamic, time-dependent process. However, electric 
currents may also exist in stationary situations, when a current source, for example a battery, replenishes 
the conductor charges and hence sustains currents at a certain time-independent level - see Fig. lc. (As 
we will see below, in most cases this process requires a persistent replenishment of the electrostatic 
energy from either a source or storage of energy of a different kind - say, the chemical energy of the 
battery.) Let us discuss the laws governing the distribution of such dc currents. In this case (d/dt = 0), 
Eq. (5) reduces to a very simple equation 

V-j = 0. (4.6) 

This equation acquires an even a simpler form in the particular but important case of electric 
circuits (Fig. 3), the systems may be presented as an electric connection of components of two types: 



2 Similar differential relations are valid for the density of any conserved quantity, for example for mass in 
classical fluid dynamics (see, e.g., CM Sec. 8.2), and for the probability in statistical physics (SM Sec. 5.6) and 
quantum mechanics (QM Sec. 1.4). 
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(i) small-size (lumped) circuit elements (also called "two-terminal devices"), meaning a passive 
resistor, a current source, etc. - generally, any black box with two wires sticking out of it, and 

(ii) perfectly conducting wires, with negligible voltage drop along them, that are galvanically 
connected at certain points, called nodes (or "junctions"). 




Fig. 4.3. Typical system obeying the Kirchhoff 
laws. 



In the standard circuit theory, the electric charges of the nodes are considered negligible, and we 
may integrate Eq. (6) over the closed surface drawn around any node to get 

Z 7 , =°> (4.7a) 

i 

where the summation is over all the wires (numbered with index j) connected in the node. On the other 
hand, according to its definition (2.25), voltage drop Vk across each circuit element may be presented as 
the difference of potentials of the adjacent nodes, V& = 0k - <j>k-\- Summing such differences around any 
closed loop of the circuit (Fig. 3), we get all terms cancelled, so that 

5X=0. (4.7b) 

k 

These relations are called, respectively, the 1 st and 2 nd Kirchhoff laws - or sometimes the node 
rule and the loop rule. They may seem elementary, and the genuine power of the Kirchhoff approach is 
in the fact a set of Eqs. (7), covering every node and every circuit element of the system, gives a system 
of equations sufficient for the calculation of all currents and voltages in it - provided that the relation 
between current and voltage in known for each circuit element. 

It is almost evident that in the absence of current sources, the system of equations (7) has only a 
trivial solution: Ij = 0, Vk = 0 - with the exotic exception of superconductivity, to be discussed in Sec. 
6.3. The current sources, that allow non-vanishing current flows, may be described by their 
electromotive forces (e.m.f.) V k , having the dimensionality of voltage, which have to be taken into 

account in the corresponding terms Vk of sum (7b). Let me hope that the reader has some experience of 
using Eqs. (7) for the analysis of simple circuits - say consisting of several resistors and dc batteries - 
so I may save time on a discussion of these simple problems. 

4.2. The Ohm law 

As was mentioned above, the relations spelled out in Sec. 1 are sufficient for forming a closed 
system of equations for finding currents and electric field in a system only if they are complemented 



Chapter 4 



Page 3 of 14 



Essential Graduate Physics 



EM: Classical Electrodynamics 



with material equations relating scalars / and V in each circuit element, i.e. vectors j and E in each 
point of the medium of such an element. The simplest, and most frequently met relation of this kind is 
the famous Ohm law whose differential form is 



j = <rE, (4.8) 



where a is a constant called conductivity. 3 Though this is not a fundamental relation, and is 
approximate for any conducting media, we can argue that if: 

(i) there is no current at E = 0 (mind superconductors!), 

(ii) the medium is isotropic or almost isotropic (a notable exception: some organic conductors), 

(iii) the mean free path I of current carriers is much smaller than the characteristic scale a of the 
spatial variations of j and E, 

then the Ohm law may be viewed as a result of the Taylor expansion of the local relation j(E) in 
relatively small fields, and thus is very common. 

Table 1 gives the experimental values of dc conductivity for some practically important (or just 
representative) materials. The reader can see that the range of its values is very broad, covering more 
that 30 orders of magnitude, even without going to such extremes as very pure metallic crystals at very 

12 

low temperatures, where a may reach -10 S/m. 



Table 4.1. Ohmic conductivities for some representative (or practically important) materials at 20°C. 



Material 


(7 (S/m) 


Teflon ([C 2 F 4 ]„) 


1Q -22 _ 1Q -24 


Silicon dioxide 


,0-16.10-19 


Various glasses 


10 10 -10 14 


Deionized water 


~10" 6 


Sea water 


5 


Silicon 7i-doped to 10 16 crn 1 


2.5xl0 2 


Silicon 7i-doped to 10 19 crn 1 


1.6xl0 4 


Silicon /7-doped to 10 19 crn 1 


l.lxlO 4 


Nichrome (alloy 80% Ni + 20% Cr) 


0.9xl0 6 


Aluminum 


3.8xl0 7 


Copper 


6.0xl0 7 


Zinc crystal along a-axis 


1.65xl0 7 


Zinc crystal along c-axis 


1.72xl0 7 



3 In SI units, the conductivity is measured in Siemens per meter, where one Siemens (S) is the reciprocal of one 
ohm: 1 S = (1 Q)" 1 = 1 A / 1 V. The constant reciprocal to conductivity, 1/cr, is called resistivity, and is commonly 
denoted by letter p. I will, however, try to avoid using this notion, because I am already overusing this letter. 
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In order to get some feeling what do these values mean, let us consider a very simple system 
(Fig. 4): a plane capacitor of area A » d , filled with a material that has not only a dielectric constant s r , 
but also some Ohmic conductivity a, with much more conductive plate electrodes. 




Fig. 4.4. "Leaky" plane capacitor. 



Assuming that these properties are compatible with each other, 4 we may assume that the 
distribution of electric potential (not too close to the capacitor edges) still obeys Eq. (2.39), so that the 
electric field is vertical and uniform, with E = V/d. Then, according to Eq. (6) the current density is also 
uniform, j = oE= aVId. From here, the total current between the plates is 

/ = jA = aEA = a—A. (4.9) 
d 

On the other hand, from Eqs. (2.26) and (3.45), the instant value of plate charge is Q = C m V = 
(s r £oA/d)V. Plugging these relations into Eq. (1), we see that the speed of charge (and voltage) 
relaxation does not depend on the geometric parameters A and d: 

^ = -^, r r ^, (4.10) 
dt r r a 

where parameter r r has the sense of the relaxation time constant. As we know (see Table 3.1), for most 
practical materials the dielectric constant is within one order of magnitude from 10, so that the 
nominator of Eq. (10) is of the order of 10" 10 . As a result, according to Table 1, the charge relaxation 
time ranges from ~10 14 s (more than a million years!) for best insulators like teflon, to ~10" 18 s for the 
least resistive metals. 

What is the physics behind these values of a and why, for some materials, Table 1 gives them 
with such a large uncertainty? If charge carriers move as classical particles (e.g., in plasmas or non- 
degenerate semiconductors), a reasonable description of conductivity is given by the famous Drude 
formula. 5 In his picture, due to weak electric field, the charge carriers are accelerated in its direction 
(possibly on the top of their random motion in all directions, i.e. with a vanishing average velocity 
vector): 

- = ^E, (4.11) 
dt m 

and as a result their velocity acquires an the average value 

v)4r = %, (4.12) 
dt m 



4 As will be discussed in Chapter 6, such simple analysis is only valid if <j is not too high. 

5 It was suggested by P. Drude in 1900. 
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where the phenomenological parameter r= llv (not to be confused with r r \) may be understood as the 
effective average time between carrier scattering events. From here, the current density: 




(4.13a) 

Two 

versions 

(Notice the independence of cr of the carrier charge sign.) Another form of the same result, more popular of the 
in the physics of semiconductors, is 



Drude 
formula 



cr = q 2 n/u, with /u = — 
m 



(4.13b) 



where parameter ju, defined by relation (v) = juE, is called the charge carrier mobility. 

Most good conductors (e.g., metals) are essentially degenerate Fermi gases (or liquids), in which 
the average thermal energy of a particle, k^T is much lower that the Fermi energy a?. In this case, a 
quantum theory is needed for the calculation of cr. Such theory was developed by the quantum physics' 
godfather A. Sommerfeld in 1927 (and is sometimes called the Drude-Sommerfeld model). I have no 
time to discuss it in this course, 6 and here I will only notice that for an ideal, isotropic Fermi gas the 
result is reduced to Eq. (13), with a certain effective value of r, so it may be used for estimates of cr, 
with due respect to the quantum theory of scattering. In a typical metal, n is very high (-10 cm" 3 ) and 
is fixed by the atomic structure, so that the sample quality may only affect cr via the scattering time z. 

At room temperature, the scattering of electrons by thermally-excited lattice vibrations 
(phonons) dominates, so that r and cr are high but finite, and do not change much from one sample to 
another. (Hence, the more accurate values given for metals in Table 1.) On the other hand, at T — > 0, a 
perfect crystal should not exhibit scattering at all, and conductivity should be infinite. In practice, this is 
never true (for example, due to electron scattering from imperfect boundaries of finite-size samples), 
and the effective conductivity cr is infinite (or practically infinite, at least above the measurable value 
~10 2 S/m) only in superconductors. 7 

On the other hand, the conductivity of quasi-insulators (including deionized water) and 
semiconductors depends mostly of the carrier density n that is much lower than in metals. From the 
point of view of quantum mechanics, this happens because the ground-state eigenenergies of charge 
carriers are localized within an atom (or molecule), and separated from excited states, with space- 
extended wavefunctions, by a large energy gap (called bandgap). For example, in Si02 the bandgap 
approaches 9 eV, equivalent to -4,000 K. This is why, even at room temperatures the density of 
thermally-excited free charge carriers in good insulators is negligible. In these materials, n is determined 
by impurities and vacancies, and may depend on a particular chemical synthesis or other fabrication 
technology, rather than on fundamental properties of the material. (On the contrary, the carrier mobility 
//in these materials is almost technology-independent.) 

The practical importance of the technology may be illustrated on the following example. In cells 
of the so-called floating-gate memories, in particular the- flash memories, which currently dominate the 
nonvolatile digital memory technology, data bits are stored as small electric charges (Q ~ 10" 16 C) of 



6 For such a discussion see, e.g., SM Sec. 6.3. 

7 Electrodynamic properties of superconductors are so interesting (and important) that I will discuss them in more 
detail in Chapter 6. 
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highly doped silicon islands (so-called floating gates) separated from the rest of the integrated circuit 
with a ~10-nm-thick layer of silicon dioxide, SiC>2. Such layers are fabricated by high-temperature 
oxidation of virtually perfect silicon crystals. The conductivity of the resulting high-quality (though 
amorphous) material is so low, <j ~ 10" 1 S/m, that the relaxation time z r , defined by Eq. (10), is well 
above 10 years - the industrial standard for data retention in non-volatile memories. In order to 
appreciate how good this technology is, the cited value should be compared with the typical 
conductivity a~ 10" 16 S/m of the usual, bulk SiC>2 ceramics. 8 



4.3. Boundary problems 

For an Ohmic conducting media, we may combine Eqs. (6) and (8) the following differential 
equation 

V-(o-V^) = 0. (4.14) 

For a uniform conductor (cr = const), Eq. (14) is reduced to the Laplace equation for the electrostatic 
potential </>. As we already know from Chapters 2 and 3, its solution depends on the boundary 
conditions. These conditions depend on the interface type. 

(i) Conductor-conductor interface. Applying the continuity equation (6) to a Gauss-type pillbox 
at the interface of two different conductors (Fig. 5), we get 

(jnh = (jnh, (4.15) 

so that if the Ohm law is valid inside each medium, then 




(4.16) 



Also, since the electric field should be finite, its potential <fi has to be continuous across the 
interface - the condition that may also be written as 

M = M.. (4.17) 

dr dr 



8 Unfortunately, these notes are not the appropriate platform to discuss details of the floating-gate memory 
technology. However, I think that every educated physicist should know its basics, because such memories are 
presently the driver of all semiconductor integrated circuit technology development. Perhaps, the best available 
book is J. Brewer and M. Gill (eds.), Nonvolatile Memory Technologies with Emphasis on Flash, IEEE, 2008. 
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Both these conditions (and hence the solutions of the boundary problems using them) are similar to 
those for the interface between two dielectrics - cf. Eqs. (3.46)-(3.47). 

Note that using the Ohm law, Eq. (17) may be rewritten as 

— OV)i=— OV) 2 - (4-18) 

cr, a 2 

Comparing it with Eq. (15) we see that, generally, the current density magnitude changes at the 
interface: j\ ^ j%. It is also curious that if <j\ ^ 02, the current line slope changes at the interface (Fig. 4), 
qualitatively to the refraction of light rays in optics - see Chapter 7. 

(ii) Conductor-electrode interface. The definition of an electrode, or a "perfect conductor", is a 
medium with a — > 00. Then, at fixed current density at the interface, the electric field in the electrode 
tends to zero, and hence it may be described by equation 

0 = j.= const, (4.19) 

where constants may be different for different electrodes (numbered with index j). Note that with 
such boundary conditions the Laplace boundary problem becomes exactly the same as in electrostatics - 
see Eq. (2.35) - and hence we can use all the methods (and some solutions :-) of Chapter 2 for finding 
dc current distribution. 

(iii) Conductor-insulator interface. For the description of an insulator, we can use a = 0, so that 
Eq. (16) yields the following boundary condition, 

f*-0, (4.20) 
on 

for the potential derivative inside the conductor. From the Ohm law we see that this is just the very 
natural requirement for the dc current not to flow into an insulator. 

Now, note that this condition makes the Laplace problem inside the conductor completely well- 
defined, and independent on the potential distribution in the adjacent insulator. On the contrary, due to 
the continuity of the electrostatic potential at the border, its distribution in the insulator has to follow 
that inside the conductor. Let us discuss this conceptual issue on the following (apparently, trivial) 
example: dc current in a long wire with a constant cross-section area A. The reader certainly knows the 
answer: 



V V I 

I = — , where R = — = — . 
R I oA 



(4.21) 



Uniform 

wire's 

resistance 



where / is the wire length, and constant R is called the resistance. 9 However, let us get this result 
formally from our theoretical framework. For the ideal geometry shown in Fig. 6a, this is easy to do. 
Here the potential evidently has a linear ID distribution 

<j> = const (4.22) 



9 The first of Eqs. (21) is essentially the integral form of the Ohm law (8), and is valid not only for a uniform 
wire, but for any Ohmic conductor with a geometry in which / and V may be clearly defined. 
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both in the conductor and the surrounding free space, with both boundary conditions (16) and (17) 
satisfied at the conductor-insulator interfaces, and condition (20) satisfied at the conductor-electrode 
interfaces. As a result, the electric field is constant and has only one component E x = VII, so that inside 
the conductor 

j x =oE x , I = j x A, (4.23) 

giving us the well-known Eq. (21). 




However, what about the geometry shown in Fig. 6b? In this case the field distribution in the 
insulator is dramatically different, but according to boundary problem defined by Eqs. (14) and (20), 
inside the conductor the solution is exactly the same as it was in the former case. Now, the Laplace 
equation in the surrounding insulator has to be solved with the boundary values of the electrostatic 
potential, "dictated" by the distribution of the current (and hence potential) in the conductor. 

Let us solve a problem in that this conduction hierarchy may be followed analytically to the very 
end. Consider an empty spherical cavity cut in a conductor with an initially uniform current flow with 
constant density jo = njo (Fig. 7a). Following the conduction hierarchy, we have to solve the boundary 
problem in the conducting part of the system, i.e. outside the sphere (r > R), first. Since the problem is 
evidently axially-symmetric, we already know the general solution of the Laplace equation - see Eq. 
(2.172). Moreover, we know that in order to match the uniform field at r — » oo , all coefficients a/ but 
one (a\ = - Eq = - j 0 /a) have to be zero, and that the boundary conditions at r = R will give zero solutions 
for all coefficients bi but one (b\), so that 



/ b 

^ = -^rcos# + ^cos6>, for r>R.. (4.24) 
cr r 



In order to find coefficient b\, we have to use the boundary condition (20) at r = R: 

cos6> = 0. (4.25) 



8r 







2b^ 


| r=R 







This gives b\ = -j^R 12a, so that, finally, 



Chapter 4 



Page 9 of 14 



Essential Graduate Physics 



EM: Classical Electrodynamics 



Jo 



r + - 



R 

~2? 



3 A 



COS0 . 



(4.26) 



(Note that this potential distribution corresponds to the dipole moment p = -E^R 12. It is easy to check 
that if the empty sphere was cut in a dielectric, the potential distribution outside the cavity would be 
similar, with p = -EqR (s r - l)/(s r + 2). In the limit s r — > oo, these two results coincide, despite the rather 
different type of the problem: in the dielectric case, there is no current at all.) 

(b) 






Fig. 4.7. Spherical cavity in a uniform conductor: (a) the problem's geometry, and (b) the equipotential 
surfaces, as given by Eq. (40) for r > R and Eq. (42) for r < R. 



Now, as the second step in the conductivity hierarchy, we may find the electrostatic potential 
distribution <fij,6) in the insulator, in this particular case inside the cavity (r < R). It should also satisfy 
the Laplace equation with the boundary conditions at r = R, "dictated" by distribution (26): 

0(R,0) = -^^Rcos0. (4.27) 

We could again solve this problem by the formal variable separation (keeping in the general solution 
(2.172) only the term proportional to b\, that does not diverge at r — > 0), but if we notice that boundary 
condition (27) depends on just one Cartesian coordinate, z = RcosO, the solution may be just guessed: 

j(r,0) = --l±z = --—rco80, atr<R. (4.28) 
2 a 2 a 

It evidently satisfies the Laplace equation and the boundary condition (27), and corresponds to a 
constant vertical electric field equal to 3/o/2 o- see Fig. 6b. 

The conductivity hierarchy says that static electrical fields and charges outside conductors (e.g., 
electric wires) do not affect currents flowing in the wires, and it is physically clear why. For example, if 
a charge in vacuum is slowly moved close to a wire, it (in accordance with the linear superposition 
principle) will only induce an additional surface charge (see Chapter 2) that screens the external 
charge's field, without participating in (or disturbing) the current flow inside the conductor. 
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Besides the conceptual discussion, the two examples given above may be considered as a 
demonstration of the application of the first two methods described in Chapter 2 (the orthogonal 
coordinates (Fig. 5) and variable separation (Fig. 6)) to dc current distribution problems. Continuing this 
review of the methods we know, let us discuss the analog of the method of charge images. Let us 
consider the spherically- symmetric potential distribution of the electrostatic potential, similar to that 
given by Eq. (1.35): 

</) = -. (4.29) 
r 

As we know from Chapter 1, this is a particular solution of the 3D Laplace equation at all points but r = 
0, and hence is a legitimate solution in a current-carrying conductor as well. In vacuum, this distribution 
would correspond to a point charge q = An&)C\ but what about the conductor? Calculating the 
corresponding electric field and current density, 

E = -V^ = 4r, j = oE = a^r, (4.30) 
r r 

we see that the total current flowing from the point in the origin through a sphere of an arbitrary radius r 
does not depend on the radius: 

/ =Aj = 47rr 2 j = Anoc. (4.31) 
Plugging the resulting c into Eq. (29), we get 

(/> = -'—. (4.32) 
Ancrr 

Hence the Coulomb-type distribution of the electric potential in a conductor is possible (at least 
at some distance from the singular point r = 0), and describes dc current / flowing out of a small-size 
electrode - or into such a point, if coefficient c is negative. Such current injection may be readily 
implemented experimentally; think for example about an insulated wire with a small bare end, inserted 
into a poorly conducting soil - an important method in geophysical research. 10 

Now let the injection point r' be close to a plane interface between the conductor and an 
insulator (Fig. 8). In this case, besides the Laplace equation, we should satisfy the boundary condition, 

j n =aE n =-a^ = 0. (4.33) 
on 

It is clear that this can be done by replacing the insulator for a conductor with an additional 
current injection point, at the mirror image point r". Note, however, that in contrast to the charge 
images, the sign of the imaginary current has to be similar, not opposite, to the initial one, so that the 
total electrostatic potential inside the conducting semi-space is 



Ana 



1 1 

r + ; 



VI 



r-r r-r 



(4.34) 



10 Such situations are even more natural in 2D situations, for example, think about a wire soldered, in a small spot, 
to a thin metallic foil. (Note that here the current density distribution law is different, j cc 1/r rather than 1/r 2 .) 
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(Note that the image current's sign would be opposite if we discussed an interface between a conductor 
with a moderate conductivity and a perfect conductor ("electrode") whose potential should be virtually 
constant.) 




Fig. 4.8. Method of images at dc conduction. 



This result may be readily used, for example, to calculate the current density at the conductor's 
surface, as a function of distance p from point 0 (the surface point closest to the current injection) - see 
Fig. 8. At the surface, Eq. (34) yields 

t = T- ( 2 1 2V /2 ' ( 435 ) 

so that the current density is independent of cr. 

Deviations from Eqs. (35) and (36), which are valid for a uniform medium, may be used to find 
and characterize conductance inhomogeneities, say, those due to mineral deposits in the Earth crust. 11 



4.4. Dissipation power 

Let me conclude this brief chapter with an ultra-short discussion of energy dissipation in 
conductors. In contrast to the electrostatics situations in insulators (vacuum or dielectrics), at dc 
conduction the electrostatic energy U is "dissipated" (i.e. transferred to heat) at a certain rate T = - 
dUldt, called dissipation power. 12 This rate may evaluated by calculating the power of electric field's 
work on a single moving charge: 

1> x = F • v = qE ■ v . (4.37) 



11 In practice, the current injection may be produced, due to electrochemical reactions, by an ore mass itself, so 
that one need only measure (and interpret :-) the resulting potential distribution - the so-called self-potential 
method - see, e.g., Sec. 6.1 in monograph by W. Telford et al, Applied Geophysics, 2 nd ed., Cambridge U. Press, 
1990. 

12 Since the electric field and hence the electrostatic energy are time-independent, this means that the energy is 
replenished at the same rate from the current source(s). 
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After the summation over all charges, Eq. (37) gives us the dissipation power. If the charge 
density n is uniform, multiplying by it both parts of this equation, and taking into account that qn\ = j, 
for the power dissipated in a unit volume we get the Joule law 



General 
Joule 
law 



Joule law 
for Ohmic 
conductivity 



■p -p n 



(4.38) 



In the particular case of the Ohmic conductivity, this expression may be also rewritten in two 
other forms: 



r ■ 



a 



(4.39) 



At dc conduction, the energy is permanently replenished by a flow of power from the current source(s). 



4.5. Exercise problems 



4.1 . Find the resistance between two large conductors separated 
with a very thin, plane, insulating partition, with a circular hole of 
radius i? in it - see Fig. on the right. 

Hint. You may like to use the degenerate ellipsoidal coordinates 
that have been used in Sec. 2.4 to find the self-capacitance of a round 
disk in vacuum. 




4.2 . Calculate the effective (average) conductivity cr et of a 
medium with many empty spherical cavities of radius R, carved at 
random points in a uniform Ohmic conductor (see Fig. on the right), in 
the limit of low density n « R~ of the spheres. 

Hint: Try to use the analogy with a dipole media (Sec. 3.2). 




43. Calculate the voltage drop V across a uniform, wide 
resistive slab of thickness t, at distance / from the points of A 



injection/ejection of dc current / that is passed across the slab - see 
Fig. on the right. 

Hint: Try to use the dc current analog of the charge image 
method. 
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Chapter 5. Magnetism 

Despite the fact that we are now starting to discuss a completely new type of electromagnetic 
interactions, its coverage (for the stationary case) will take just one chapter, because we will be able to 
recycle many ideas and methods of electrostatics, though with a twist or two. 



5.1. Magnetic interaction of currents 

DC currents in conductors usually leave them electroneutral, p(r) = 0, with a very good 
precision, because any virtual misbalance of positive and negative charge density results in extremely 
strong Coulomb forces that restore their balance by an additional shift of free carriers. 1 This is why let 
us start the discussion of magnetic interactions from the simplest case of two spatially-separated, 
current-carrying, electroneutral conductors (Fig. 1). 




Fig. 5.1. Magnetic interaction of two 
currents. 



According to the Coulomb law, there should be no force between them. However, several 
experiments carried out in the early 1820s 2 proved that such non-Coulomb forces do exist, and are the 
manifestation of another, magnetic interactions between the currents. In the contemporary used in this 
course, their results may be summarized with one formula, in SI units expressed as: 3 



F = -ir J d ' r \ J v (j (r) • j'( r '))r— - ■ t 5 - 1 ) 

4n v v \r-r'\ 

Here coefficient (where juo is called either the magnetic constant or the free space permeability), 

by definition, equals exactly 10" 7 SI units, thus relating the electric current (and hence electric charge) 
definition to that of force - see below. 

Note that the Coulomb law (1.1), with the account of the linear superposition principle, may be 
presented in a very similar form: 



Magnetic 
force 
between 
currents 



1 The most important case when the electroneutrality does not hold is the motion of electrons in vacuum. In this 
case, magnetic forces coexist with (typically, stronger) electrostatic forces - see Eq. (3) below and its discussion. 
In some semiconductor devices, local violations of electroneutrality also play an important role. 

2 Most notably, by H. C. 0rsted, J.-B. Biot and F. Savart, and A.-M. Ampere. 

3 In the Gaussian units, coefficient juq/ 4 7T is replaced with 1/c 2 (i.e., implicitly with juqSo) where c is the speed of 
light, in modern metrology considered exactly known - see, e.g., appendix CA: Selected Physical Constants. 
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= ^\d'r\d'r' p{Y)p'{Y')^-. (5.2) 



v r |r — r . 

Besides the different coefficient and sign, the "only" difference of Eq. (1) from Eq. (2) is the scalar 
product of current densities, evidently necessary because of the vector character of the current density. 
We will see that this difference will bring certain complications in applying the electrostatics 
approaches, discussed in the previous chapters, to magnetostatics. 

Before going to their discussion, let us have one more glance at the coefficients in Eqs. (1) and 
(2). To compare them, let us consider two objects with uncompensated charge distributions p{r) and 
p'(r), each moving parallel to each other as a whole certain velocities v and v', as measured in an 
inertial "lab" frame. In this case, j(r) = p(r)\, j(r)-j '(r) = p{r)p\r)vv', and the integrals in Eqs. (1) and 
(2) become functionally similar, and differ only by the factor 

^magnetic = ftw' , 1 w' „ „ 

^electnc _ " I* 4*S„ C 2 ' 

(This expression hold in any consistent system of units.) We immediately see that magnetism is an 
essentially relativistic phenomenon, very weak in comparison with the electrostatic interaction at the 
human scale velocities, v « c, and may dominate only if the latter interaction vanishes - as it does in 
electroneutral systems. 4 

Also, Eq. (3) points at an interesting paradox. Consider two electron beams moving parallel to 
each other, with the same velocity v with respect to a lab frame. Then, according to Eq. (3), the net force 

2 2 

of their total (electric and magnetic) interaction is proportional to (1 - v Ic ), and tends to zero in the 
limit v — > c. However, in the reference frame moving together with electrons, in which they are not 
moving at all, and v = 0. Hence, from the point of view of such moving observer, the electron beams 
should interact only electrostatically, with a repulsive force independent of velocity v. Historically, this 
had been one of several paradoxes that led to the development of the special relativity; I will discuss its 
resolution in Chapter 9. 

Returning to Eq. (1), in some simple cases, the double integration in it may be carried out 
analytically. First of all, let us simplify this expression for the case of two thin, long conductors (wires) 
separated by a distance much larger than their thickness. In this case we may integrate the products \d r 
and \ 'd r' over wires' cross-sections first, neglecting the corresponding change of (r - r"). Since the 
integrals of the current density over the cross-sections of the wire are just the currents / and /' in the 
wires, and cannot change along their lengths (correspondingly, / and /'), they may be taken out of the 
remaining integrals, reducing Eq. (1) to 

F =-^tf(*-*'>T&- (54) 



4 The discovery and initial studies of such a subtle, relativistic phenomenon as magnetism in the early 19th 
century was much facilitated by the relative abundance of natural ferromagnets, materials with spontaneous 
magnetic polarization, whose strong magnetic field may be traced back to relativistic effects (such as spin) in 
atoms. (The electrostatic analogs of such materials, electrets, are much more rare.) I will briefly discuss the 
ferromagnetism in Sec. 5 below. 



Chapter 5 



Page 2 of 38 



Essential Graduate Physics 



EM: Classical Electrodynamics 



As the simplest example, consider two straight, parallel wires (Fig. 2), separated by distance d, 
with length / » p. In this case, due to symmetry, the vector of magnetic interaction force has to: 

(i) lay in the same plane as the currents, and 

(ii) be perpendicular to the wires - see Fig. 2. 

Hence we can limit our calculations to just one component of the force. Using the fact that with the 
coordinate choice shown in Fig. 2, dr-dr ' = dxdx ', we get 



F = 



An 



dx dx' — - — '■ 
■L J _ d 2 + 



sin 6 



—CO —CO 



(JC-JC*) 



i\2 



An 



I dx ^dx' 



-co -co 



[d 2 + (x-x') 2 } 



Introducing, instead of x\ a new, dimensionless variable £ = (x - x')/p, we may reduce the internal 
integral to a table integral which we have already met in this course: 



F = 



Jjf +CO +CO , j> 

M 0 U fx.f d% 



And 



\dx\ 



—00 —oo 



2n d 



(5.6) 



The integral over x is formally diverging, but this means merely that the interaction force per unit length 
of the wires is constant: 



F_ 
7 



2nd 



(5.7) 



Note that the force drops rather slowly (only as lid) as the distance d between the wires is increased, 
and is attractive (rather than repulsive as in the Coulomb law) if the currents are of the same sign. 




Fig. 5.2. Magnetic force between two 
straight parallel currents. 



This is an important result, 5 but again, the problems solvable so simply are few and far between, 
and it is intuitively clear that we would strongly benefit from the same approach as in electrostatics, i.e., 
from breaking Eq. (1) into a product of two factors via the introduction of a suitable field. Such 
decomposition may done as follows: 



Lorentz 
force on 
a current 



F = Jj(r)xB(r)J 3 r. 



(5.8) 



5 In particular, Eq. (7) is used for the legal definition of the SI unit of current, one ampere (A), via the SI unit of 
force (the newton, N), with coefficient // 0 fixed as listed above. 
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where vector B is called the magnetic field (in our particular case, induced by current j 




Biot- 

/r q\ Savart 
law 



The last equation is called the Biot-Savart law, while F expressed by Eq. (8) is sometimes called the 
Lorentz force. 1 However, more frequently the later term is reserved for the full force, 



F = ^(E + vxB), 



(5.10) 



Lorentz 
force on 
a particle 



exerted by electric and magnetic fields field on a point charge q, moving with velocity v. (The 
equivalence of Eq. (8) and the magnetic part of Eq. (10) follows from the summation of all forces acting 
on n particles in a unit volume, moving with the same velocity v, so that j = qn\.) 

Now we have to prove that the new formulation (8)-(9) is equivalent to Eq. (1). At the first 
glance, this seems unlikely. Indeed, first of all, Eqs. (8) and (9) involve vector products, while Eq. (1) is 
based on a scalar product. More profoundly, in contrast to Eq. (1), Eqs. (8) and (9) do not satisfy the 3 rd 

3 3 

Newton's law, applied to elementary current components \d r and j 'd r \ if these vectors are not parallel 
to each other. Indeed, consider the situation shown in Fig. 3. Here vector j ' is perpendicular to vector (r 
- r '), and hence, according to Eq. (9), produces a nonvanishing contribution dB ' to the magnetic field, 
directed (in Fig. 3) perpendicular to the plane of drawing, i.e. is perpendicular to vector j. Hence, 
according to Eq. (8), this field provides a nonvanishing contribution to F. On the other hand, if we 
calculate the reciprocal force F ' by swapping indices in Eqs. (8) and (9), the latter equation immediately 
shows that dB(r') oc jx(r - r') = 0, because the two operand vectors are parallel (Fig. 3). Hence, the 
current component j WV does exert a force on its counterpart, while \d 3 r does not. 



a3*= 

</B'*0 

,rJF 9t 0 



JB' = 0 
JF' = 0 



Fig. 5.3. Apparent violation of the 3 ld 
Newton law in magnetism. 



Despite this apparent problem, let us still go ahead and plug Eq. (9) into Eq. (8): 




(5.11) 



6 The SI unit of the magnetic field is called tesla, T - after N. Tesla, an electrical engineering pioneer. In the 
Gaussian units, the already discussed constant 1/c 2 in Eq. (1) is equally divided between Eqs. (8) and (9), so that 
in them both, the constant before the integral is 1/c. The resulting Gaussian unit of field B is called gauss (G); 
taking into account the difference of units of electric charge and length, and hence current density, 1 G equals 
exactly 10" 4 T. Note also that in some textbooks, especially old ones, B is called either the magnetic induction, or 
the magnetic flux density, while the term "magnetic field" is reserved for vector H that will be introduced Sec. 5 
below. 

Named after H. Lorentz, who received a Nobel prize for his explanation of the Zeeman effect, but is 
more famous for his numerous contributions to the development of special relativity - see Chapter 9. To 
be fair, the magnetic part of the Lorentz force was correctly calculated first by O. Heaviside. 
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This double vector product may transformed into two scalar products, using the vector algebraic identity 
called the bac minus cab rule, ax(bxc) = b(a-c) - c(a-b). 8 Applying this relation, with a = j, b = j ', and c 
= R = r - r ', to Eq. (1 1), we get 

F = ^JrfVjWjrfVi^l-^Jrf 3 rJrfVj(r).j'(r')^. (5-12) 
Ani \t R Ant t R 



The second term in the right-hand part of this equation coincides with the right-hand part of Eq. (1), 
while the first term equals zero, because its the internal integral vanishes. Indeed, we may break 
volumes V and V into narrow current tubes, the stretched sub-volumes whose walls are not crossed by 
current lines (j„ = 0). As a result, the (infinitesimal) current in each tube, dl = jdA = jd 2 r, is the same 
along its length, and, just as in a thin wire, j d 2 r may be replaced with dldx. Because of this, each tube's 
contribution to the internal integral in the first term of Eq. (12) may be presented as 



dlUr ■ -4- = -dlldr ■ V - = -dlldr—-, (5.13) 
T R 3 t R t drR 



where operator V acts in the r space, and the integral is taken along tube's length /. Due to the current 
continuity, each loop should follow a closed contour, and an integral of a full differential of some scalar 
function (in our case, Urn) along it equals zero. 

So we have recovered Eq. (1). Returning for a minute to the paradox illustrated with Fig. 3, we 
may conclude that the apparent violation of the 3 rd Newton law was the artifact of our interpretation of 
Eqs. (8) and (9) as sums of independent elementary components. In reality, due to the dc current 
continuity expressed by Eq. (4.6), these components are not independent. For the whole currents, Eqs. 
(8)-(9) do obey the 3 rd law - as follows from their already proved equivalence to Eq. (1). 

Thus we have been able to break the magnetic interaction into the two effects: the creation of the 
magnetic field B by one current (in our notation, j '), and the effect of this field on the other current (j). 
Now comes an additional experimental fact: other elementary components \d r' of current j also 
contribute to the magnetic field (9) acting on component )d 3 r. 9 This fact allows us to drop prime after j 
in Eq. (9), and rewrite Eqs. (8) and (9) as 

B(r) = ^fj(r')x- , -^ r JV, (5.14) 
An s r |r -r'| 

F = {j(r)xB(r)rf 3 r, (5.15) 

V 

Again, the field observation point r and the field source point r ' have to be clearly distinguished. We 
immediately see that these expressions are similar to, but still different from the corresponding relations 
of the electrostatics, namely Eq. (1.8), 



8 See, e.g., MA Eq. (7.5). 

9 Just in electrostatics, one needs to exercise due caution at transfer from these expressions to the limit of discrete 
classical particles, and extended wavefunctions in quantum mechanics, in order to avoid the (non-existing) 
magnetic interaction of a charged particle upon itself. 
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E(r) = — *— |p(r')p — ~^d 3 r' , (5.16) 
Att€ 0 s v , |r-r'| 

and the distributed version of Eq. (1.6): 

F = |p(r)E(r)jV . (5.17) 

V 

(Note that the sign difference has disappeared, at the cost of the replacement of scalar-by-vector 
multiplications in electrostatics with cross-products of vectors in magnetostatics.) 

For the frequent case of a field of a thin wire of length /', Eq. (14) may be re- written as 

B (r) = ^f dr'x-^-. (5.18) 
An f, | r _ r '| 

Let us see how does the last formula work for the simplest case of a straight wire (Fig. 4a). The 
magnetic field contribution dB due to any small fragment dr' of the wire's length is directed along the 
same line (perpendicular to both the wire and the perpendicular d dropped from the observation point to 
the wire line), and its magnitude is 




Fig. 5.4. Magnetic fields of: (a) a straight current, and (b) a current loop. 



This is a simple but very important result. (Note that it is only valid for very long (/ » d), 
straight wires.) It is especially crucial to note the "vortex" character of the field: its lines go around the 
wire, forming round rings with the centers on the current line. This is in the sharp contrast to the 
electrostatic field lines that can only begin and end on electric charges and never form closed loops 
(otherwise the Coulomb force qE would not be conservative). In the magnetic case, the vortex field may 
be reconciled with the potential character of magnetic forces, which is evident from Eq. (1), due to the 
vector products in Eqs. (14)-(15). 
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Now we may use Eq. (15), or rather its thin- wire version 

F = /frfrxB(r), (5.21) 

to apply Eq. (20) to the two-wire problem (Fig. 2). Since for the second wire vectors dr and B are 
perpendicular to each other, we immediately arrive at our previous result (7). 

The next important application of the Biot-Savart law (14) is the magnetic field at the axis of a 
circular current loop (Fig. 4b). Due to the problem symmetry, the net field B has to be directed along the 
axis, but each of its components dB is tilted by angle 0 = arctan(z/i?) to this axis, so that its axial 
component 

u I dr' R 

dB z =dBcos0 = ^—f^ 1 . (5.22) 

An R 2 +z 2 ( R i +Z i) m 

Since the denominator of this expression remains the same for all wire components dr', in this case the 
integration is trivial (jdr' = 2nR), giving finally 

Note that the magnetic field in the loop's center (i.e., for z = 0), 

B = ^L, (5.24) 

is n times higher than that due to a similar current in a straight wire, at distance d = R from it. This 
increase it readily understandable, since all elementary components of the loop are at the same distance 
R from the observation point, while in the case of a straight wire, all its point but one are separated from 
the observation point by a distance larger than d. 

2 2 3 

Another notable fact is that at large distances (z » R ), field (23) is proportional to z~ : 

p 0 I R 2 _ Mo 2m 

just like the electric field of a dipole (along its direction), with the replacement of the electric dipole 
moment magnitude p with m = IA, where A = nR is the loop area. This is the best example of a 
magnetic dipole, with dipole moment m - the notions to be discussed in more detail in Sec. 5 below. 



5.2. Vector-potential and the Ampere law 

The reader can see that the calculations of the magnetic field using Eq. (14) or (18) are still 
cumbersome even for the very simple systems we have examined. As we saw in Chapter 1, similar 
calculations in electrostatics, at least for several important systems of high symmetry, could be 
substantially simplified using the Gauss law (1.16). A similar relation exists in magnetostatics as well, 
but has a different form, due to the vortex character of the magnetic field. To derive it, let us notice that 
in an analogy with the scalar case, the vector product under integral (14) may be transformed as 
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7(r')x(r-r') _ v ^ j(r') 



(5.26) 



where operator V acts in the r space. (This equality may be really verified by its Cartesian components, 
noticing that the current density is a function of r' and hence its components are independent of r.) 
Plugging Eq. (26) into Eq. (14), and moving operator V out of the integral over r', we see that the 



magnetic field may be presented as the curl of another vector fie 



B(r) = VxA(r), 



d: 10 



namely the so-called vector-potential: 




(5.27) 



(5.28) 



Please note a wonderful analogy between Eqs. (27)-(28) and, respectively, Eqs. (1.33) and (1.38). This 
analogy implies that vector-potential A plays, for the magnetic field, essentially the same role as the 
scalar potential <fi plays for the electric field (hence the name "potential"), with due respect to the vortex 
character of A. I will discuss this notion in detail below. 

Now let us see what equations we may get for the spatial derivatives of the magnetic field. First, 
vector algebra says that the divergence of any curl is zero. 11 In application to Eq. (27), this means that 

VB = 0. (5.29) 



Vector- 
potential 



VxB(r) 



An 



Vx 



j(r') 



d 3 r' 



This expression may be simplified by using the following general vector identity: 12 

Vx(Vxc) = V(V-c)-V 2 c, 

applied to vector c(r) = j(r ')/|r - r '|: 



VxB 



An 



Vjj(r')- 



1 



An 



fj(r')V 2 — ^V. 
,* r-r 



(5.30) 



(5.31) 



(5.32) 



No 

magnetic 
monopoles 



Comparing this equation with Eq. (1.27), we see that Eq. (29) may be interpreted as the absence of a 
magnetic analog of an electric charge on which magnetic field lines could originate or end. Numerous 
searches for such hypothetical magnetic charges, called magnetic monopoles, using very sensitive and 
sophisticated experimental setups, have never given a convincing evidence of their existence in Nature. 

Proceeding to the alternative, vector derivative of the magnetic field (i.e., its curl), and using Eq. 
(28), we get 



As was already discussed during our study of electrostatics, 



10 In the Gaussian units, Eq. (27) remains the same, and hence in Eq. (28), coefficient // 0 /4^is replaced with 1/c. 

11 See, e.g., MA Eq. (11.2). 

12 See, e.g., MA Eq. (11.3). 
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V 2 ^- = -4^(r-r') ! 



(5.33) 



so that the last term of Eq. (32) is just ju 0 \(r). On the other hand, inside the first integral we can replace 
V with (-V '), where prime means differentiation in the space of radius-vector r '. Integrating that term by 
parts, we get 



VxB 



47i i r 



1 



-d 2 r' + 



V'-j(r') 



+ // 0 j(r). 



(5.34) 



Applying this equation to the volume V limited by a surface S' sufficiently distant from the field 
concentration (or with no current crossing it), we may neglect the first term in the right-hand part of Eq. 
(34), while the second term always equals zero in statics, due to the dc charge continuity - see Eq. (4.6). 
As a result, we arrive at a very simple differential equation 13 



VxB = // 0 j 



(5.35) 



This is the (dc form of) the inhomogeneous Maxwell equation, which in magnetostatics plays the 
role similar to the Poisson equation (1.27) in electrostatics. Let me display, for the first time in this 
course, this fundamental system of equations (at this stage, for statics only), and give the reader a minute 
to stare at their beautiful symmetry - that has inspired so much of the 20 th century physics: 



Static 
Maxwell 
equations 



VxE = 


= 0, 


VxB 


= Moh 


V-E = 


P_ 


V B 


= 0. 




^0 ' 







(5.36) 



Their only asymmetry, two zeros in the right hand parts (for the magnetic field's divergence and electric 
field's curl), is due to the absence in Nature of, respectively, the magnetic monopoles and their currents. 
I will discuss these equations in more detail in Sec. 6.7, after the equations for field curls have been 
generalized to their full (time-dependent) versions. 

Returning now to a more mundane but important task of calculating magnetic field induced by 
simple current configurations, we can benefit from an integral form of Eq. (35). For that, let us integrate 
this equation over an arbitrary surface S limited by a closed contour C, applying to it the Stokes 
theorem. 14 The resulting expression, 



Ampere 
law 



§B-dr = ju 0 ^j n d 2 r = ju 0 I ! 



(5.37) 



where / is the net electric current crossing surface S, is called the Ampere law. 

As the first example of its application, let us return to a current in a straight wire (Fig. 4a). With 
the Ampere law in our arsenal, we can readily pursue an even more ambitious goal - calculate the 
magnetic field both outside and inside of a wire of arbitrary radius R, with an arbitrary (albeit axially- 
symmetric) current distribution j(p) - see Fig. 5. Selecting two contours C in the form of rings of some 



13 As in all earlier formulas for the magnetic field, in the Gaussian units the coefficient // 0 in this relation has to be 
replaced with AttIc. 

14 See, e.g., MA Eq. (12. 1) with f = B. 
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radius p in the plane perpendicular to the wire axis z, we have B- dr = Bp(d(p), these q> is the azimuthal 
angle, so that the Ampere law (37) yields: 



2k pB = ju 0 x 



2K\j{p')p'dp', fovp<R, 

o 

R 

2K\j(p')p'dp' = I, for p>R. 



(5.38) 



Thus we have not only recovered our previous result (20), with the notation replacement d — > p, in a 
much simpler way, but could also find the magnetic field distribution inside the wire. (In the most 
common case when the wire conductivity a is constant, and hence the current is uniformly distributed 
along its cross-section, j(p) = const, the first of Eqs. (38) immediately yields B oc p for p < R). 



Z A 



C 



p>R 




Fig. 5.5. The simplest application of the Ampere 
law: dc current in a straight wire. 



Another important example is a straight, long solenoid (Fig. 6a), with dense winding: n A » 1, 
where n is the number of wire turns per unit length and A is the area of solenoid's cross-section - not 
necessarily circular. 




From the symmetry of this problem, the magnetic field may have only one (in Fig. 6a, vertical) 
component B that may only depend on the horizontal position p of the observation point. First taking a 
plane Ampere contour C\, with both long sides outside the solenoid, we get B{pi) - 5(pi) = 0, because 
the total current piercing the contour equals zero. This is only possible if the field equals zero at any p 
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outside of the (infinitely long!) solenoid. With this result on hand, from contour C2 we get the following 
relation for the internal field: 

Bl = ju 0 NI, (5.39) 

where N is the number of wire turns passing through the contour of length /. This means that regardless 
of the exact position internal side of the contour, the result is the same: 

B = ju 0 jl = ju 0 nl. (5.40) 

Thus, the field inside an infinitely long solenoid is uniform; in this sense, a long solenoid is a magnetic 
analog of a wide plane capacitor. 

As should be clear from its derivation, the obtained result, especially that the field outside of the 
solenoid equals zero, is conditional on the solenoid length being very large in comparison with its lateral 
size. (From Eq. (25), we may predict that for a solenoid of a finite length /, the external field is only a 
factor of -All 2 lower than the internal one.) Much better suppression of this external ("fringe") field may 
be obtained using the toroidal solenoid (Fig. 6b). The application of Ampere law to this geometry shows 
that, in the limit of dense winding (N» 1), there is no fringe field at all - for any relation between two 
radii of the thorus, while inside the solenoid, and distance p from the center, 

B = !!£L. (5.41) 
2np 

We see that a possible drawback of this system for practical applications is that internal field depends on 
p, i.e. is not quite uniform; however, if the thorus is thin, this problem is minor. 

How should we solve the problems of magnetostatics for systems whose low symmetry does not 
allow getting easy results from the Ampere law? (The examples are of course too numerous to list; for 
example, we cannot use this approach even to reproduce Eq. (23) for a round current loop.) From the 
deep analogy with electrostatics, we may expect that in this case we could recover the field from the 
solution of a certain partial boundary problem for the field's potential, in this case the vector-potential 
A defined by Eq. (28). However, despite the similarity of this formula and Eq. (1.38) for <j>, that was 
emphasized above, there are two additional issues we should tackle in the magnetic case. 

First, finding vector-potential distribution means determining three scalar functions (say, A x , A y , 
and A z ), rather than one (0). Second, generally the differential equation satisfied by A is more complex 
than the Poisson equation for </>. Indeed, plugging Eq. (27) into Eq. (35), we get 

Vx(VxA) = // 0 j. (5.42) 

If we wrote the left-hand part of this equation in (say, Cartesian) components, we would see that they 
are much more interwoven than in the Laplace operator, and hence much less convenient for using the 
orthogonal coordinate approach or the variable separation method. In order to remedy the situation, let 
us apply to Eq. (42) the now-familiar identity (31). The result is 

V(V-A)-V 2 A = // 0 j. (5.43) 

We see that if we could kill the first term in the left-hand part, for example if V A = 0, the second term 
would give us a set of independent Poisson equations for each Cartesian component of vector A. 
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In this context, let us discuss what discretion do we have in the potential choice. In electrostatics, 
we might add to 0 not only an arbitrary constant, but also an arbitrary function of time, without 
affecting the electric field: 

-Vfo + /(f)] = -V^ = E. (5.44) 

Similarly, using the fact that curl of the gradient of any scalar function equals zero, 15 we may add to A 
not only a constant, but even a gradient of an arbitrary function %(r, t), because 

Vx(A + Vj) = VxA + Vx = V x A = B . (5.45) 

Such additions, keeping the actual (observable) fields intact, are called gauge transformations. 16 Let us 
see what such a transformation does to V A: 



V-(A + V^) = V-A + V 2 ^. 



(5.46) 



Hence we can choose a function % i n sucn a wa Y that the divergence of the transformed vector-potential, 
A ' = A + V%, would vanish, so that the new vector-potential would satisfy the vector Poisson equation 



V 2 A' 



together with the so-called Coulomb gauge condition: 



V-A' = 0. 



(5.47) 



(5.48) 



Poisson 
equation 
for A 



Coulomb 
gauge 



This gauge is very convenient; one should, however, remember that the resulting solution A'(r) may 
differ from the function given by Eq. (28) - while field B remains the same. 17 

In order to get a better feeling of vector-potential's distribution in space, let us solve Eq. (47) for 
the same straight wire problem (Fig. 5). As Eq. (28) shows, in this case vector A has just one component 
(along the axis z). Moreover, due to the problem's axial symmetry, its magnitude may only depend on 
the distance from the axis: A = n z A(p). Hence, the gradient of A is directed across axis z, so that Eq. (48) 
is satisfied even for this vector, i.e. the Poisson equation (47) is satisfied even for the original vector A. 
For our symmetry {dldcp = d/dz = 0), the Laplace operator, written in cylindrical coordinates, has just 
one term, 18 reducing Eq. (47) to 



]_d_ 
p dp 



P 



dA 
dp^ 



= -MoJ(P) 



(5.49) 



Multiplying both parts of this equation by p and integrating them over the coordinate once, we get 

dA 



p— = -/j 0 \j(p')p'dp' + const 
dp J 0 



(5.50) 



15 See, e.g., MA Eq. (11.1). 

16 The use of term "gauge" (originally meaning "a measure" or "a scale") in this context is purely historic, so the 
reader should not try to find too much hidden sense in it. 

17 Since most equations for A are valid for A' as well, I will follow the common (possibly, bad) tradition, and in 
many cases use the same notation, A, for both functions. 

18 See, e.g., MA Eq. (10.3). 
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Since in the cylindrical coordinates, for our symmetry, 19 B = - dAldp, Eq. (50) is nothing else than our 
old result (38) for the magnetic field. 20 However, let us continue the integration, at least for the region 
outside the wire, where the function A(p) depends only on the full current / rather than on the current 
distribution inside the wire. Dividing both parts of Eq. (50) by p, and integrating them over that 
coordinate again, we get 

j R 

A(p) = -^5— In p + const, where / = 2tz\ j{p)pdp . (5.51) 

1.71 " 

As a reminder, we had the similar logarithmic behavior for the electrostatic potential outside a 
uniformly charged straight line. This is natural, because the Poisson equations for both cases are similar. 

Now let us find the vector-potential for the long solenoid (Fig. 6a), with its uniform magnetic 
field. Since Eq. (28) prescribes vector A to follow the direction of the current, we can start with looking 
for it in the form A = n p A(p). (This is especially natural if the solenoid's cross-section is circular.) With 
this orientation of A, the same general expression for the curl operator in cylindrical coordinates yields 
VxA = n z (\/p)d(pA)/dp. According to the definition (27) of A, this expression should be equal to B, in 
our case equal to n z B, with constant B - see Eq. (40). Integrating this equality, and selecting such 
integration constant so that .4(0) is finite, we get 




(5.52) 



Plugging this result into the general expression for the Laplace operator in the cylindrical coordinates, 21 
we see that the Poisson equation (47) with j = 0 (i.e. the Laplace equation), is satisfied again - which is 
natural since for this distribution, VA = 0. However, Eq. (52) is not the unique (or even the simplest) 
solution of the problem. Indeed, using the well-known expression for the curl operator in Cartesian 
coordinates, 22 it is straightforward to check that either function A = n y Bx, or function A "= -n x By, or any 
of their weighed sums, for example A'" = (A' + A")/2 = B(-n^y + n y x)/2, also give the same magnetic 
field, and also evidently satisfy the Laplace equation. If such solutions do not look very natural due to 
their anisotropy in the [x, y] plane, please consider the fact that they represent the uniform magnetic 
field regardless of its source (e.g., of the shape of long solenoid's cross-section). Such choices of vector- 
potential may be very convenient for some problems, for example for the analysis of the 2D motion of a 
charged quantum particle in the perpendicular magnetic field, giving the famous Landau energy levels. 23 

5.3. Magnetic energy, flux, and inductance 

Now let us discuss the energy related to magnetic interactions. If we consider the currents 
flowing in a system as generalized coordinates, magnetic forces (1) between them are their unique 
functions, and in this sense the magnetic interaction energy U may be considered a potential energy of 



19 See, e.g., MA Eq. (10.5) with dld(p= d/dz = 0. 

20 Since the magnetic field at the wire axis has to be zero (otherwise, being perpendicular to the axis, where would 
it be directed?), the integration constant in Eq. (50) should be zero. 

21 See, e.g., MA Eq. (10.6). 

22 See, e.g.,MAEq. (8.5). 

23 See, e.g., QM Sec. 3.2. 
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the system. Perhaps the simplest way to calculate the energy is to use the analogy between Eq. (1) and 
its electrostatic analog, Eq. (2). As we know from Chapter 1, if these densities describe the distribution 
of the same charge, i.e. if p'(r) = p(r), then the self-interaction of its elementary components correspond 
to the potential energy expressed by Eq. (1.61): 



U 



- — — f d 3 r f d 
rc o J J 



47T£ 0 2 



(5.53) 



Using the analogy, for the magnetic interaction between components of the same current, with density 
j(r) = j '(r), we may write 



(5.54) 



while for independent currents the coefficient Vz should be removed. 

Due to the importance of this relation, let us rewrite it in several other forms, beneficial for 
different applications. First of all, just as in electrostatics, Eq. (54) may be recast into a potential-based 
form. Indeed, using definition (28) of the vector-potential A(r), Eq. (54) becomes 24 




Magnetic 

interaction 

energy 



u=Um-md'r. 



(5.55) 



This formula, that is a clear magnetic analog of Eq. (1.62) of electrostatics, is very popular among 
theoretical physicists, because it is very handy for the field theory manipulations. However, for many 
calculations it is more convenient to have a direct expression of energy via the magnetic field. Again, 
this may be done very similarly to what we have done in Sec. 1.3 for electrostatics, i.e. plugging into Eq. 
(55) the current density expressed from Eq. (35) to transform it as 25 



U = -['}■ Ad'r = — f A-(VxB)J ; 
2 1 2jU 0 J 



— [B-(VxA>/V — |V-(AxB>/V. (5.56) 

2 Mo J 2 M 0 J 



Now using the divergence theorem, the second integral may be transformed into a surface integral of 
product (AxB)„. Equations (27)-(28) show that if the current distribution j(r) is localized, this product 
drops with distance r faster than 1/r 2 , so that if the integration volume is large enough, the surface 
integral is negligible. In the remaining first integral, we may use Eq. (27) to recast VxA into the 
magnetic field. As a result, we get a very simple and fundamental formula. 



U - 



— \B 2 d 3 r. 
2// 0 J 



(5.57a) 



Just as with the electric field, this expression may be interpreted as a volume integral of the magnetic 
energy density u: 



U=\u(r)d 3 r, with u(r) = — B 2 (r)-. 
J 2 M 0 



Magnetic 

(5.57b) field 

energy 



24 This relation remains the same in the Gaussian units, because in those units both Eq. (28) and Eq. (54) should 
be stripped of their /jo/4k coefficients. 

25 For that, we may use MA Eq. (1 1.7) with f = A and g = B, giving A-(VxB) = B-(VxA) - V-(AxB). 
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clearly similar to Eq. (1.67). 26 Again, the conceptual choice between the spatial localization of magnetic 
energy - either at the location of electric currents only, as implied by Eqs. (54) and (55), or in all regions 
where the magnetic field exists, as apparent from Eq. (57b), cannot be done within the framework of 
magnetostatics, and only electrodynamics gives the decisive preference for the latter choice. 

For the practically important case of currents flowing in several thin wires, Eq. (54) may be first 
integrated over the cross-section of each wire, just as was done at the derivation of Eq. (4). Again, since 
the integral of the current density over k th wire's cross-section is just the current I k in the wire, and 
cannot change along its length, it may be taken from the remaining integrals, giving 

U= A) 1 V r r tl dr k - dr , 



^V,ff^f, (5.58) 



4 ^ 2 ^ i k i k Jk- r *\ 

where / is the full length of the wire loop. Note that Eq. (58) is valid if currents I k are independent of 
each other, because the double sum counts each current pair twice, compensating coefficient V% in front 
of the sum. It is useful to decompose this relation as 



k,k' 



Mutual 
inductance 
coefficients 



/"off dr k • dr k < 



^ kk ' A J J I 



(5.60) 



Coefficient in the quadratic form (59), with k ^ k', is called the mutual inductance between 
current loops k and k', while the diagonal coefficient L k = L kk is called the self-inductance (or just 
inductance) of k th loop. 27 From the symmetry of Eq. (60) with respect to the index swap, k <-> k\ it 
evident that the matrix of coefficients L kk - is symmetric: 28 

L kk ,=L k , k , (5.61) 
so that for the practically important case of two interacting currents I\ and I 2 , Eq. (59) reads 

U = -L x ll +MI l I 2 + ^L 2 I 2 , (5.62) 

where M= Lu = L%\ is the mutual inductance coefficient. 

These formulas clearly show the importance of self- and mutual inductances, so I will 
demonstrate their calculation for at least a few basic geometries. Before doing that, however, let me 
recast Eq. (58) into one more form that may facilitate such calculations. Namely, let us notice that for 
the magnetic field induced by current I k in a thin wire, Eq. (28) is reduced to 



26 The transfer to the Gaussian units in Eqs. (77)-(78) may be accomplished by the usual replacement juq —> 4-tt, 
thus giving, in particular, u = B 2 l%n. 

27 As evident from Eq. (60), these coefficients depend only on the geometry of the system. Moreover, in the 
Gaussian units, in which Eq. (60) is valid without the factor /Jq/471, the inductance coefficients have the dimension 
of length (centimeters). The SI unit of inductance is called the henry, abbreviated H - after J. Henry, 1797-1878, 
who in particular discovered the effect of electromagnetic induction (see Sec. 6.1) independently of M. Faraday. 

28 Note that the matrix of the mutual inductances Ljj- is very much similar to the matrix of reciprocal capacitance 
coefficients pi&— for example, compare Eq. (62) with Eq. (2.21). 
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Mo r f dr k 



An J v \r-r k , 



so that Eq. (58) may be rewritten as 



2 k,k' i 



(5.63) 



(5.64) 



But according to the same Stokes theorem that was used earlier in this chapter to derive the Ampere law, 
and Eq. (27), such integral is nothing more than the magnetic field flux (more frequently called just the 
magnetic flux) through a surface S limited by the contour / : 29 

(5.65) 



(5.66) 




Magnetic 
flux 



As a result, Eq. (64) may be rewritten as 



where ©m' is the flux of the field induced by £'-th current through the loop of the A:-th current. 
Comparing this expression with Eq. (59), we see that 



< 5> kk - = \{* k )d 2 r = L kk J k „ 



(5.67) 



Magnetic 
flux from 
currents 



This expression not only gives us one more means for calculating coefficients L kk ; but also 
shows their physical sense: the mutual inductance characterizes how much field (colloquially, "how 
many field lines") induced by current l k < penetrate the loop of current I k , and vice versa. Since due to the 
linear superposition principle, the total flux piercing A>th loop may be presented as 



(5.68) 



For example, for the system of two currents this expression is reduced to a clear analog of Eqs. (2.19): 

$! =L 1 I l +MI 2 , 



0 2 =MI X +L 2 I 2 



For the even simpler case of a single current, 



0 = LI. 



so that the magnetic energy of the current may be presented in several equivalent forms: 



1,1 1 , 

U = -I 2 =-/<D = — O 2 
2 2 2L 



(5.69) 



(5.70) 



(5.71) 



O and U 
of a 
single 
current 



29 The SI unit of magnetic flux is called weber, abbreviated Wb - after W. Weber, who in particular co-invented 
(with C. Gauss) the electromagnetic telegraph, and in 1856 was first, together with R. Kohlrausch, to notice that 
the value of (in modern terms) l/(£b//o) 2 , derived from electrostatic and magnetostatic measurements, coincides 
with the independently measured speed of light c, giving an important motivation for Maxwell's theory. 
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These relations, similar to Eqs. (2. 14)-(2. 15) of electrostatics, show that the self-inductance L of a 
current loop may be considered as a measure of system's magnetic energy at fixed current. 

Now we are well equipped for the calculation of inductances, having three options. The first one 
is to use Eq. (60) directly. 30 The second one is to calculate the magnetic field energy from Eq. (57) as 
the function of currents h in the system, and then use Eq. (59) to find all coefficients Lkw- For example, 
for a system with just one current, Eq. (71) yields 

L = -¥—. (5.72) 
7 2 /2 

Finally, if the system consists of thin wires, so that the loop areas Sk and hence fluxes are well 
defined, we may calculate them from Eq. (65), and then use Eq. (67) to find the inductances. 

Actually, the first two options may have advantages over the third one even for such system of 
thin wires for whom the notion of magnetic flux is not quite clear. As an important example, let us find 
inductance of a long solenoid - see Fig. 6a. We have already calculated the magnetic field inside it - see 
Eq. (40) - so that, due to the field uniformity, the magnetic flux piercing each wire turn is just 

® 1 =BA = v 0 nIA, (5.73) 

where A is the area of solenoid's cross-section - for example nR 2 for a round solenoid, though Eq. (40) is 
more general. Comparing Eqs. (73) and (67), one might wrongly conclude that L = <t>i/7 = /UonA 
[WRONG!], i.e. that the solenoid's inductance is independent on its length. Actually, the magnetic flux 
Oi pierces each wire turn, so that the total flux through the whole current loop, consisting of N turns, is 

O = NO 1 =ju 0 n 2 lAI, (5.74) 
and the correct expression for solenoid's inductance is 

L = j = {i 0 n 2 lA, (5.75) 

i.e. the inductance per unit length is constant: L/1 = jUon A. Since this reasoning may seem a bit flimsy, it 
is prudent to verify it by using Eq. (72) to calculate the full magnetic energy inside the solenoid 
(neglecting minor fringe and external field contributions): 

1 1 I 2 

U = B 2 Al = (ii 0 nl) 2 Al = {i 0 n 2 lA — . (5.76) 

2jU 0 2ju 0 2 

Plugging this result into Eq. (72) immediately confirms result (75). 

The use of the first two options for inductance calculation becomes inevitable for continuously 
distributed currents. As an example, let us calculate self-inductance L of a long coaxial cable with the 
cross-section shown in the Fig. 7. 31 



30 Numerous applications of this Neumann formula to electrical engineering problems may be found, for example, 
in the classical text F. Grover, Inductance Calculations, Dover, 1946. 

31 As a reminder, the mutual capacitance C between the conductors of such a system was calculated in Sec. 2.3. 
As will be discussed in Chapter 7 below, the pair of parameters L and C define the propagation of the most 
important, TEM mode of electromagnetic waves along the cable. 
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Let us assume that the current is uniformly distributed over the cross-sections of both 
conductors. (As we know from the previous chapter, such distribution indeed takes place if both the 
internal and external conductors are made of a uniform resistive material.) First, we should calculate the 
radial distribution of the magnetic field (that of course has only one, azimuthal component, because of 
the axial symmetry of the problem). This distribution may be immediately found from the application of 
the Ampere law to circles of radii p within four different ranges: 



2npB — // 0 /| piercing the circ i earea — /J 0 I x 



2 2 

c -p 



c 2 -b : 
0, 



for p < a, 
for a < p < b, 
for b < p < c, 

for c < p. 



(5.77) 



Now, an elementary integration yields the magnetic energy per unit length of the cable: 



U_ 
I 



— \B 2 d 2 r = — \B 2 pdp = 
2 M 0 J Mo o 
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Mo_ 
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In — + 

a 
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-b 1 [c z -b z b 2 



(5.78) 



From here, and Eq. (72), we get the final answer: 



L 
7 



Mo_ 
In 



In — + ■ 

a 



c 2 -b 2 



c , c 1 
— In 

c 2 -b 2 b 2 



(5.79) 



Note that for the particular case of a thin outer conductor, c - b«b, this expression reduces to 

L 
7 



Mo_ 
2n 



r, b r 

In — + - 

V a 4, 



(5.80) 



where the first term in the parentheses may be traced back to the contribution of the magnetic field 
energy in the free space between the conductors. This distinction is important for some applications, 
because in superconductor cables, as well as resistive-metal cables as high frequencies (to be discussed 
in the next chapter), the field does not penetrate the conductor bulk, so that Eq. (80) is valid without the 
last term, 1/4, in the parentheses, that is due to the magnetic field energy inside the wire. 
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As the last example, let us calculate the mutual inductance between a long straight wire and a 
round wire loop adjacent to it (Fig. 8), neglecting the thickness of both wires. 




Fig. 5.8. Study case for the 
mutual inductance calculation. 



Here there is no problem with using the last formalism, based on the magnetic flux calculation. 
Indeed, in the Cartesian coordinates shown in Fig. 8, Eq. (20) reads B\ = juoI\/27ry, giving the following 
magnetic flux through the round wire loop: 



®21 = 



2n 



- R R J R 2_ X 2)' 2 y n o R-[R -x ) x o 1 + J 



This is a table integral equal to n? 1 so that $ 2 i = juoIiR, and the final answer for the mutual inductance 
M= L\ 2 = L 2 \ = $>2\U\ is finite (and very simple): 



M = ^ 0 R 



(5.82) 



despite magnetic field's divergence at the lowest point of the loop (y = 0). Note that in contrast with the 
finite mutual inductance of this system, se/^inductances of both wires are formally infinite in the thin- 
wire limit - see, e.g., Eq. (80), that in the limit bla » 1 describes a thin straight wire. However, since 
this divergence is very weak (logarithmic), it is quenched by any deviation from this perfect geometry. 
For example, a good estimate of the inductance of a wire of a large but finite length / may be obtained 
from Eq. (81) via the replacement of b with /: 



L 



In a 



(5.83) 



(Note, however, that the exact result depends on where from/to the current flows beyond that segment. ) 
A close estimate, with / replaced with 2nR, and b replaced with R, is valid for the self-inductance of the 
round loop. A more exact calculation of this inductance, asymptotically correct in the limit a « R, is a 
very useful exercise, which is highly recommended to the reader. 33 



32 See, e.g., MAEq. (6.13) for a = 1. 

33 Its solution may be found, for example, just after Sec. 34 of L. Landau et al, Electrodynamics of Continuous 
Media, 2 nd ed., Butterwort Heinemann, 1984. 
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5.4. Magnetic dipole moment, and magnetic dipole media 

The most natural way of description of magnetic media parallels that described in Chapter 3 for 
dielectrics, and is based on properties of magnetic dipoles. To introduce this notion quantitatively, let us 
consider, just as in Sec. 3.1, a spatially-localized system with current distribution j(r), whose magnetic 
field is measured at relatively large distances r»r' (Fig. 9). 




Applying the truncated Taylor expansion (3.4) to definition (28) of the vector potential, we get 



A(r)« 



An 



I|j(r'^V + ^{(r-r')j(r'^V 



(5.84) 



Due to the vector character of this potential, we have to depart slightly from the approach of Sec. 3.1 
and use the following vector algebra identity: 34 



J[/(j-Vg) + g(j.V/)]jV = 0 



(5.85) 



that is valid for any pair of smooth (differentiable) scalar functions j[x) and g(r), and any vector function 
j(r) that, as the dc current density, satisfies the continuity condition V j = 0 and whose normal 
component vanishes on its surface. 

First, let us use Eq. (85) with f= 1 and g equal to any component of the radius-vector r: g = n (i 
= 1,2, 3). Then it yields 



\(yn i )d 3 r = \j i d 3 r=0. 



(5.86) 



so that for the vector as the whole 



j»/V = o, 



(5.87) 



showing that the first term in the right-hand part of Eq. (84) equals zero. Next, let us use Eq. (85) with / 
= r t , g = ri> (i, i'= 1,2, 3); then it yields 



j(r,jr+r f j l )d 3 r = 0, 



(5.88) 



so that the z' th Cartesian component of the second integral in Eq. (84) may be transformed as 



34 See, e.g., MA Eq. (12.3) with additional condition j„\s= 0, pertinent for space-restricted currents. 
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J(r -T')j t d V = J£r,r',y,rf V = 1 J r, j"(r',,y, + r',y,)J V 

(/ ^ i'=\ ^ i'=\ y 

= ^r\^'rJ,-^rJr)d 3 r' = -\ rxj>xj)rfV 



/'=i (/ 

As a result, Eq. (85) may be rewritten as 



(5.89) 



Magnetic 
dipole and 
its potential 



A(r) 



ju 0 mxr 

An r 3 



where vector m, defined as 35 




(5.90) 



(5.91) 



is called the magnetic dipole moment of our system - that itself, within approximation (90), is called the 
magnetic dipole. 

Note a close analogy between m and the angular momentum of a non-relativistic particle with 
mass mk. 



k T k ' 



(5.92) 



where p& = m^k is its mechanical momentum. Indeed, for a continuum of such particles with the same 
electric charge q, with the spatial density n, j = qm, and Eq. (91) yields 



m 



fl .,3 cnq 3 
J— rxjrf r = J — rxvfl r. 



(5.93) 

while the total angular momentum of such continuous system of particles of the same mass (mk = mo) is 

L = |nm 0 rx vd 3 r, 



so that we get a very straightforward relation 



m vs. L 



111 



2m n 



L . 



(5.95) 



Bohr 
magneton 



For the orbital motion, this classical relation survives in quantum mechanics for operators and hence for 
eigenvalues, in whom the angular momentum is quantized in the units of the Plank's constant h, so that 
for an electron, the orbital magnetic moment is always a multiple of the so-called Bohr magneton 

(5.96) 

where m e is the free electron mass. 36 However, for particles with spin, such a universal relation between 
vectors m and L is no longer valid. For example, electron's spin s = X A gives contribution h/2 to the 
mechanical momentum, but its contribution to the magnetic moment it still very close to //b- 37 




In the Gaussian units, definition (91) is kept valid, so that Eq. (90) is stripped of the factor [mJAti. 
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The next important example of a magnetic dipole is a planar wire loop limiting area A (of an 
arbitrary shape), carrying current /, for which m has a surprisingly simple form, 

m = /A, (5.97) 

where the modulus of vector A equals area A, and its direction is perpendicular to loop's plane. This 
formula may be readily proved by noticing that if we select the coordinate origin on the plane of the 
loop (Fig. 10), then the elementary component of the magnitude of integral (91), 




(5.98) 



is just the elementary area dA = {Xliydh = (l/2)rd(rsin(p) = r 2 d(pl2. 




The combination of Eqs. (96) and (97) allows a useful estimate of the scale of atomic currents, 
by finding what current / should flow in a circular loop of atomic size scale (the Bohr radius) r B ~ 
0.5x1 0" 10 m, i.e. of area A « 10" 20 m 2 , to produce a magnetic moment equal to //b- 38 The result is 
surprisingly macroscopic: I ~ \ mA (quite comparable to the currents driving your earbuds :-). Though 
this estimate should not be taken too literally, due to the quantum-mechanical spread of electron's 
wavefunctions, it is very useful for getting a feeling how significant the atomic magnetism is and hence 
why ferromagnets may provide such a strong field. 

After these illustrations, let us return to Eq. (90). Plugging it into the general formula (27), we 
may calculate the magnetic field of a magnetic dipole: 



B(r) 



An: 



^3r(r • m) -mr 2 



(5.99) 
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The structure of this formula exactly duplicates that of Eq. (3.15) for the electric dipole field. Because of 
this similarity, the energy of a dipole in an external field, and hence the torque and force exerted on it by 
the field, are also absolutely similar to the expressions for an electric dipole - see Eqs. (3.15)-(3.18): 



U 



-m-B. 



(5.100) 
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field 



36 In SI units, m e » 0.91xl0" 30 kg, so that jub ~ 0.93xl0" 23 J/T. 

37 See, e.g., QM Sec. 4.1 and beyond. 

38 Another way to arrive at the same estimate is to take I ~ ef = ecdln with a> 
frequency of radiation due to atomic interlevel quantum transitions. 
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and as a result, 

T = mxB ext , (5.101) 

F = V(m B ext ). (5.102) 

Now let us consider a system of many magnetic dipoles (e.g., atoms or molecules), distributed in 
space with density n. Then we can use Eq. (90) (generalized in the evident way for an arbitrary position, 
r ', of a dipole), and the linear superposition principle, to calculate the "macroscopic" component of the 
vector-potential A - in other words, dipole's potential averaged over short-scale variations on the inter- 
dipole distances: 



ju Q r M(r')x(r-r') 



A(r) = ^M "' v ' '" v " " ' d 3 r', (5.103) 



4n' 



r-r 



where M = nm is the macroscopic (average) magnetization, i.e. the magnetic moment per unit volume. 
Transforming this integral absolutely similarly to how Eq. (3.27) had been transformed into Eq. (3.29), 
we get: 

A(r) = ^f V ; xM(r V r'. (5.104) 
4n J r-r' 

Comparing this result with Eq. (28), we see that VxM is equivalent, in its effect, to the density 
j e f of a certain effective "magnetization current". Just as the electric-polarization "charge" p e f discussed 
in Sec. 3.2 (see Fig. 3.3), j e f = VxM may be interpreted the uncompensated part of vortex currents 
representing single magnetic dipoles (Fig. 11). 




Fig. 5.1 1. Cartoon illustrating the physical nature of 
the "magnetization current" j ef = VxM. 



Now, using Eq. (28) to add the possible contribution from "stand-alone" currents j, not included 
into the currents of microscopic dipoles, we get the general equation for the vector-potential of the 
macroscopic field: 

A(r|= j,|[Kr)tr>M(,)] jV 

4n J r-r' 
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Repeating the calculations that have led us from Eq. (28) to the Maxwell equation (35), with the account 
of the magnetization current term, for the macroscopic magnetic field B we get 39 

VxB = // 0 (j + VxM). (5.106) 

Following the same philosophy as in Sec. 3.2, we may recast this equation as 

VxH = j, (5.107) 

where a new field defined as 




(5.108) f M t gn u etic 

v ' field H 



by historic reasons (and very unfortunately) is also called the magnetic fields It is crucial to remember 
that the physical sense of field H is very much different from field B. In order to understand the 
difference better, let us use Eq. (107) to complete a macroscopic analog of system (36), called the 
macroscopic Maxwell equations (again, so far for the stationary case d/dt = 0): 



V x E = 0, 

V D = p, 



VxH = j, 
VB = 0. 



Stationary 
macroscopic 



{J -ivy) Maxwell 
equations 



One can clearly see that the roles of vector fields D and H are very similar: they could be called "would- 
be" fields - which would be induced by stand-alone charges and currents, if the media had not modified 
them by its dielectric and/or magnetic polarization. 

Despite this similarity, let me note an important difference of signs in the relation (3.33) between 
E, D, and P, on one hand, and relation (108) between B, H, and M, on the other hand. It is not just the 
matter of definition. Indeed, due to the similarity of Eqs. (3.15), and (100), including similar signs, the 
electric and magnetic fields both try to orient the corresponding dipole moments along the field. Hence, 
in the media that allow such orientation (and as we will see momentarily, for magnetic media it is not 
always the case), the induced polarizations P and M are directed along, respectively, vectors E and B. 



39 Similarly to the situation with the electric dipoles (see Eq. (3.24) and its discussion), it may be shown that the 
magnetic field of any closed current loop (or any system of such loops) satisfies the following equality: 

j"B(r)dV=(2/3)// 0 m, 

r<R 

where the integral is over any sphere confining all the currents. On the other hand, for field (99), derived from the 
asymptotic approximation (90), such integral vanishes. In order to get a course-grain description of the magnetic 
field of a small system located at r = 0, which would be valid everywhere (though at r ~ a, only approximately), 
Eq. (99) should be modified as follows: 

' 3r(r m)-mr %n \ 

— s + ^ m <H r ) 

r 3 



B c ,(r) = 



An 



J 



Hence, strictly speaking, the macroscopic field B participating in Eq. (106) and beyond is the average long-range 
field of the magnetic dipoles (plus of the stand-alone currents j) rather than the genuine average magnetic field. 
40 This confusion is exacerbated by the fact that in Gaussian units, Eq. (108) has the form H = B - 4^M, and 
hence fields B and H has one dimensionality (and are equal in free space!) - though the unit of H has a different 
name {oersted, abbreviated as Oe). Mercifully, in the SI units, the dimensionality of B and H is different, with the 
unit of H being called ampere per meter. 
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According to Eq. (3.33), if the would-be field D is fixed - say, by a fixed stand-alone charge 
distribution p(r) - such polarization reduces the genuine average electric field E = (D - V)/sq. On the 
other hand, Eq. (108) shows that in a magnetic media with fixed would-be field H, magnetic 
polarization with M ft B enhances the average magnetic field B = (H + M)//Jq. This difference may be 
traced back to the sign difference in the initial relations (1.1) and (5.1), i.e. to the basic fact that charges 
of the same sign repulse, while currents of the same direction attract each other. 

In order to form a complete system of differential equations, the macroscopic Maxwell equations 
(109) have to be complemented with "material relations" D <-> E, j <-> E, and B <-> H. In previous two 
chapters we already discussed, in brief, two of them; let us proceed to the last one. 



5.5. Magnetic materials 



Magnetic 
permeability 



Magnetic 
susceptibility 



A major difference between the dielectric and magnetic material equations D(E) and B(H) is that 
while a typical dielectric media reduces the external electric field, magnetic media may either reduce or 
enhance it. In order to quantify this fact, let us consider the so-called linear magnetics in which M (and 
hence H) are proportional to B. Just as in dielectrics, in material without spontaneous magnetization, 
such linearity at relatively low fields follows from the Taylor expansion of function M(B). For isotropic 
materials, this proportionality is characterized by a scalar - either the magnetic permeability ju, defined 
by the following relation: 

(5.110) 



B = //H 



or the magnetic susceptibility 4l defmed as 



(5.111) 



Plugging these relations into Eq. (108), we see that these two parameters are not independent, but are 
related as 



Xm VS. H 



m = (i + xJm 0 



(5.112) 



Note that despite the superficial similarity between Eqs. (HO)-(lll) and relations (3.35)-(3.38) 
for linear dielectrics: 



D = ^E, P = j e ^ 0 E, £ = (l + z e )s 0 , 



(5.113) 



there is an important conceptual difference between them. Namely, while vector E in the right-hand 
parts of Eqs. (113) is the real (average) electric field, vector H in the right-hand part of Eqs. (1 10)-(1 1 1) 
represents a "would-be" magnetic field, in all aspects similar to vector D rather than E. For relatively 
dense media, whose polarization may affect the genuine fields substantially, this difference between 



41 According to Eq. (110) (i.e. in SI units), % m is dimensionless, while /J has the same the same dimensionality as 
jUq. In the Gaussian units, fi is dimensionless, G")Gaussian = (ju)si//Jo, an d % m is also introduced differently, as n = 1 + 
\n% m , Hence, just as for the electric susceptibilities, these dimensionless coefficients are different in the two 
systems: (% m )si = 4^(/ m ) GausS i an . Note also that % m is formally called the volume magnetic susceptibility, in order to 
distinguish it from the molecular susceptibility x defined by a similar relation, m = jH, where m is the average 
induced magnetic moment of a single dipole - e.g., a molecule. Evidently, in a dilute medium, i.e. in the absence 
of substantial dipole-dipole interaction, j m = nx , where n is dipole density. 
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parameters s and ju may make their properties (e.g., the Kramers-Kronig relations, to be discussed in 
Sec. 7.3) rather different. 

Another difference between parameters a and /u (and hence between %e and % m ) is evident from 
Table 1 which lists the values of magnetic susceptibility for several materials. It shows that in contrast 
to linear dielectrics whose susceptibility % e is always positive, i.e. the dielectric constant a r = % e + 1 is 
always larger than 1 (see Table 3.1), linear magnetics may be either paramagnets {% m > 0, i. e. /u > /uo) or 
diamagnets (% m < 0, ju > jUq). 



Table 5.1. Magnetic susceptibility (j m )si of a few representative (and/or important) materials' 



"Mu-metal" (75% Ni + 15% Fe + a few %% of Cu and Mo) 


~20,000 (b) 


Permalloy (80% Ni + 20% Fe) 


~8,000 (b) 


"Soft" (or "transformer") iron 


~4,000 (b) 


Nickel 


-100 


Aluminum 


+2x1 0" 5 


Diamond 


-2xl0" 5 


Copper 


-7xl0" 5 


Water 


-9xl0" 6 


Bismuth (the strongest non-superconducting diamagnet) 


-1.7xl0" 4 



(a, The table does not include bulk superconductors, which in a crude ("macroscopic") 
approximation may be described as perfect diamagnets (with B = 0, i.e. %m = -1), though the actual 
physics of this phenomenon is more complex - see Sec. 6.3 below. 

(b) The exact values of j m for soft ferromagnetic materials depend not only on their exact 
composition, but also on their thermal processing (annealing). Moreover, due to unintentional 
vibrations, the extremely high j m of such materials may somewhat with time, though may be restored 
to approach the original value by new annealing. 

The reason of this difference is that in dielectrics, two different polarization mechanisms 
(schematically illustrated by Fig. 12) lead to the same sign of the average polarization. The first of them 
takes place in atoms without their own spontaneous polarization. A crude classical image of such an 
atom is an isotropic cloud of negatively charged electrons surrounding a positively charged nucleus - see 
Fig. 12a. The external electric field shifts the positive charge in the direction of E, and negative charges 
in the opposite direction, thus creating a dipole with aligned vectors p and E, and hence positive 
polarizability a mo \ - see Eq. (3.39). As a result, the electric susceptibility is also positive - see Eqs. 
(3.41) or (3.71). 

In the second case (Fig. 12b) of a gas or liquid consisting of polar molecules, each molecule has 
its own, spontaneous dipole moment p 0 even in the absence of external electric field. (A typical example 
is a water molecule H 2 0, with the positive oxygen ion positioned out of the line connecting two positive 
hydrogen atoms, thus producing a spontaneous dipole with moment's magnitude p 0 « ex0.38xl0" 10 m.) 
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However, in the absence of the applied electric field, the orientation of such dipoles is random, so that 
the average polarization P = n(po) equals zero. A weak applied field does not change the magnitude of 
the dipole moments significantly, but creates their preferential orientation along the field (in order to 
decrease the potential energy U = -po-E), thus creating a nonvanishing vector average (po) directed 
along E. If the applied field is not two high (p^E « k B T), the induced polarization P = n(po) is 
proportional to E, again giving a positive polarizability « mo i. 42 



(a) (b) 
E = 0 



:e © e: \«— , 



E*0 

'q © q\ ^* \ Fig. 5.12. Cartoons of two types 

of induced electrical polarization: 



! © E (T) 0 ' + / ( a ) elementary dipole induction 

p oc E / / and (b) partial ordering of 

\ h 



\® q ©/' * (p) 00 E spontaneous elementary dipoles. 



Returning to magnetics, the second of the above mechanisms, i.e. the ordering of spontaneous 
dipoles by the applied field, is responsible for the paramagnetism. Again, now according to Eq. (100), 
such field tends to align the dipoles along its direction, so that the average direction of spontaneous 
elementary moments mo, and hence the direction of M, is the same as that of the average field B (i.e., 
for a diluted media, of H « B///o), resulting in a positive susceptibility % m . However, in contrast to the 
electric polarization, there is a mechanism of magnetic polarization, called the orbital (or "Larmor" 43 ) 
diamagnetism, which gives Xm < 0. As its simplest model, let us consider the orbital motion of an 
atomic electron as classical particle of mass m 0 , with electric charge q, about an immobile attractive 
center - modeling the atomic nucleus. As classical mechanics tells us, the central attractive force does 
not change particle's angular momentum L = morxv, but the applied magnetic field B (that may be taken 
uniform on the atomic scale) does, due to the torque (101) it applies: 

— = T = mxB = ^-LxB. (5.114) 

dt 2m 0 

From the vector diagram shown in Fig. 13, it is clear that in the limit of relatively weak field, 
when the magnitude of the angular momentum L may be considered constant, this equation describes 



42 The proportionality of \{po)\ (and hence P) to E is a result of a dynamic balance between the dipole-orienting 
torque (101) and disordering thermal fluctuations. A qualitative description of such balances is one of the main 
tasks of statistical mechanics - see, e.g., SM Chapters 2 and 4. However, the very fact of proportionality P <x E in 
low fields may be readily understood as the result of the Taylor expansion of function P(E) at E — >0. 

43 After J. Larmor (1857 - 1947) who first described the torque-induced precession mathematically. 
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the rotation (called the torque-induced precession 44 ) of vector L about the direction of vector B, with the 
rate \dUdt\ = (q/2mo)LBsm& . Let me leave for the reader to use Eq. (114) for checking that, 
irrespectively of angle 0 and the sign of charge q, the sign of the resulting additional magnetic moment 
Am has a direction opposite to that of vector B, and hence %m is negative, leading to the Larmor 
diamagnetism. 45 



An important conceptual question is what exactly prevents the initial magnetic moment m that, 
according to Eq. (95), is associated with the angular momentum L of the electron, from turning along 
the magnetic field, just as in the second polarization mechanism illustrated by Fig. 12b - thus decreasing 
the potential energy (100) of the system. The answer is the same as for the usual mechanical top - it 
"wants" to fall due to the gravity field, but cannot do that due to the mechanical inertia. In classical 
physics, even a small friction (dissipation) eventually drains top's rotational kinetic energy, and it falls. 
However, in quantum mechanics the ground-state "motion" of electrons in an atom is not subjected to 
friction, because they cannot be brought to full rest due to Heisenberg's uncertainty principle. Somewhat 
counter-intuitively, the magnetic moments due to such fully-quantum effect as spin are much more 
susceptible to interaction with environment, so that in atoms with uncompensated spins, the magnetic 
dipole orientation mechanism prevails over the orbital diamagnetism, and the materials incorporating 
such atoms usually exhibit net paramagnetism - see Table 1 . 

Due to possible strong interactions between elementary dipoles, magnetism of materials is an 
extremely rich field of physics, with numerous interesting phenomena and elaborated theories. 
Unfortunately, all this physics is well outside the framework of this course, and I have to refer the 
interested reader to special literature, 46 but still need to mention its key notions. 

Most importantly, a sufficiently strong dipole-dipole interaction may lead to their spontaneous 
ordering, even in the absence of the applied field. This ordering may correspond to either parallel 
alignment of the atomic dipoles (ferromagnetism) or anti-parallel alignment of the adjacent dipoles 
(antiferromagnetism). Evidently, the external effects of ferromagnetism are stronger, because such 
phase corresponds to a substantial spontaneous magnetization M. (This value is frequently called the 



44 For a more detailed discussion of the effect see, e.g., CM Sec. 6.5. 

45 The quantum-mechanical treatment (see, e.g., QM Sec. 6.4) confirms this qualitative result, while giving 
quantitative corrections to the classical result for % m . 

46 See, e.g., D. J. Jiles, Introduction to Magnetism and Magnetic Materials, 2 nd ed., CRC Press, 1998, or R. C. 
O'Handley, Modern Magnetic Materials, Wiley, 1999. 



B 



L sin 0 , 




Fig. 5.13. Torque-induced precession of 
a charged particle in a magnetic field. 
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saturation magnetization, M s , while the corresponding magnitude of B = juqM is called either the 
saturation magnetic field, or the remanence field, Br). The direction of Br may switched by the 
application an external magnetic field, with a magnitude above certain value He called coercivity, 47 
leading to the well-known hysteretic loops on the [B, H] plane - see Fig. 14 for a typical example. 




Fig. 5.14. Experimental magnetization 
curves of specially processed (cold-rolled) 
transformer steel, i.e. a solid solution of 
-10% C and ~ 6% Si in Fe. (Adapted 
from www.thefullwiki.org/Hysteresis .) 



-153 -100 -50 0 50 100 150 



In relatively low fields, H « He, such materials may be described as hard (or "permanent",) 
ferromagnets; at such approximate treatment, magnetization M is considered constant. On the other 
hand, the theory needed for a fair description phenomena at H ~ He is rather complicated. Indeed, the 
direction of magnetization of crystals may be affected by the anisotropy of the crystal lattice. Because 
of that, typical non-crystalline ferromagnetic materials (like steel, permalloy, "mu-metal", etc.) consist 
of randomly oriented magnetic domains, each with certain spontaneous magnetization direction. The 
magnetic interaction of the domain with its neighbors and the external field determines the evolution of 
its magnetization and hence the average magnetic properties of the ferromagnet. In particular, such 
interaction explains why the hysteresis loop shape is dependent on the cycled field amplitude and 
cycling history - see Fig. 14. A very important class of multi-domain materials is the so-called soft 
ferromagnets, whose coercivity is relatively low. At low cycled field amplitude, the soft ferromagnets 
behave, on the average, as linear magnetics with very high values of % m and hence /u (see the top rows of 
Table 1, and Fig. 14) that are highly dependent on the material's fabrication technology and its post- 
fabrication thermal and mechanical treatments. 

High values of %m are is also pertinent to magnetics in which the molecular dipole interaction is 
relatively weak, so that their ferromagnetic ordering may be destroyed by thermal fluctuations, if 
temperature is increased above the so-called Curie temperature Tq. At T > 7c, such materials behave as 
paramagnets, with susceptibility obeying the Curie-Weiss law 

Xm "Y~F c - (5,115) 

(At vanishing moment interaction, T c — > 0, and Eq. (115) is reduced to the Curie law % m oc l/T typical 
for weak paramagnets.) The transition between the ferromagnetic and paramagnetic phase at T = Tq is 
the classical example of continuous phase transitions, similar to that between the paraelectric and 



Materials with very high coercivity H c are frequently called hard ferromagnets or permanent magnets. 
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ferroelectric phases of a dielectric. In both cases, the "macroscopic" (average) polarization - either M or 
P - plays the role of the so-called order parameter that (in the absence of external fields) appears at T = 
Tc and increases gradually at the further reduction of temperature. 48 

Before returning to magnetostatics per se, I have to mention the large practical role played by 
hard ferromagnetic materials (well beyond refrigerator magnets :-). Indeed, despite the decades of the 
exponential (Moore 's-law) progress of semiconductor electronics, most computer data storage systems 
are still based on the hard disk drives whose active medium is a submicron-thin ferromagnetic layer, 
with bits stored in the form of the direction of the spontaneous magnetization of small film spots. This 
technology has reached a fantastic sophistication, 49 with recording data density approaching 10 12 bits 
per square inch. Only recently it has started to be seriously challenged by the so-called solid state drives 
based on the flash semiconductor memories already mentioned in Chapter 3. 



5.6. Systems with magnetics 

Similarly to the electrostatics of linear dielectrics, magnetostatics of linear magnetics is very 
simple in the particular case when the stand-alone currents are deeply embedded into a medium with a 
constant permeability ju. Indeed, in this case, boundary conditions on the distant surface of the media do 
not affect the solution of the boundary problem described by the magnetic equations of the macroscopic 
Maxwell system (109). Now let us assume that we know the solution B 0 (r) of the magnetic pair of the 
genuine ("microscopic") Maxwell equations (36) in free space, i.e. when the genuine current density j 
coincides with that of stand-alone currents. Then the macroscopic equations and the material equation 
(110) are completely satisfied with the pair of functions 

H(r) = M), B(r)^H(r) = ^B 0 (r). (5.116) 

Mo Mo 

Hence the only effect of a complete filling a system of fixed currents with a uniform, linear 
magnetic is the increase of the magnetic field B at all points by the same constant factor ju/juo = 1 + % m . 
(As a reminder, a similar filling of a system of fixed charges with a uniform, linear dielectric leads to a 
reduction of the electric field E by factor sIsq = s x = 1 + % e ) 

However, this simple result is generally invalid in the case of non-uniform (or piece-wise 
uniform) magnetic samples. Theoretical analyses of magnetic field distribution in such non-uniform 
systems may be facilitated by two additional tools. First, integrating the macroscopic Maxwell equation 
(107) along a closed contour C limiting a smooth surface S, and using the Stokes theorem, we get the 
macroscopic version of the Ampere law (37): 



Macroscopic 
(5.117) Ampere 
law 



This is exactly the replica of the "microscopic" equation Eq. (37), with the replacement B/// 0 — » H. 



48 A discussion of such transitions may be found, in particular, in SM Chapter 4. 

49 "A magnetic head slider [the read/write head - KKL] flying over a [rather uneven - KKL] disk surface with a 
flying height of 25 nm with a relative speed of 20 meters/second is equivalent to an aircraft flying at a physical 
spacing of 0.2 um at 900 kilometers/hour." B. Bhushan, as quoted in a (generally good) book by G. Hadjipanayis, 
Magnetic Storage Systems Beyond 2000, Springer, 2001. 
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Let us apply this relation to a boundary between two regions with constant, but different ju, with 
no stand-alone currents on the border, similarly how this was done for field E in Sec. 3.4 - see Fig. 3.5. 
The result is similar as well: 

H t = const. (5.118) 

On the other hand, the integration of the Maxwell equation (29) over a Gaussian pillbox enclosing a 
border fragment (again similar to that shown in Fig. 3.5) yields the result similar to Eq. (3.46): 

B n = const, i.e. /uH n = const. (5.119) 

Let us use these boundary conditions, first, to see what happens with a thin sheet of magnetic 
material (or any other strongly elongated sample) placed parallel to a uniform external field H 0 . Such 
sample cannot noticeably disturb the field in the free space outside it: H ex t = H 0 , B ext = H ext /// 0 = H 0 //A). 
Now applying Eq. (118) to the dominating, large-area interfaces, we get H; nt = H 0 , i.e., Bi nt = {fjJfJo) B 0 . 50 
The fact of constancy of field H in this geometry explains why this field is used as the horizontal axis in 
plots like Fig. 14: such measurements are typically carried out by placing an elongated sample of the 
material into the uniform field - say the one produced by a long solenoid. 

Samples of other geometries may create strong perturbations of the external field, extended to 
distances of the order of the transversal dimensions of the sample. In order to analyze such problems, we 
may benefit from a simple, partial differential equation for a scalar function, e.g., the Laplace equation, 
because in Chapter 2 we have learned how to solve it for many simple geometries. In magnetostatics, the 
introduction of a scalar potential is generally impossible due to the vortex-like magnetic field lines, but 
if there are no stand-alone currents within the region we are interested in, then the Maxwell equation 
(32) for field H is reduced to V x H = 0, and we may introduce the scalar potential of the magnetic field, 
0m, using the relation similar to Eq. (1.33): 

H = -V^. (5.120) 

Combining it with the homogenous Maxwell equation for magnetic field, VB = 0, we arrive at the 
familiar differential equation, 

V-(//V«O = 0, (5-121) 

that, for a uniform media (/u = const), is reduced to our beloved Laplace equation. Moreover, Eqs. (118) 
and (119) give very familiar boundary conditions: 

^ = const, (5.122a) 

dr 

which is equivalent to 

<j) m = const , (5.122b) 

and 

m ^l = const. (5.123) 

dn 



50 The reader is highly encouraged to carry out a similar analysis of fields inside narrow gaps cut in a linear 
magnetic, similar to that carried out for linear dielectrics in Sec. 3.3 - see Fig. 3.6 and its discussion. 
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Note that these boundary conditions are similar for (3.46) and (3.47) of electrostatics, with the 
replacement > //. 51 

Let us analyze the geometric effects on magnetization, using the (too?) familiar structure: a 
sphere, made of a linear magnetic material, in a uniform external field. Since the differential equation 
and boundary conditions are similar to those of the similar electrostatics problem (see Fig. 3.8), we can 
use the above analogy to recycle the solution we already have got - see Eqs. (3.55)-(3.56): 

(A.L, =H 0 (-r + ^^^)cose, fo) =_# o -^-rcos0, (5.125) 

so that substantial perturbations of the external field are indeed extended to distances of the order of 
sample's radius R. On the contrary, the internal field is perfectly uniform: 

Note that though H inside the sphere is not equal to its value of the external field Ho. This 
example shows that the interpretation of H as the "would-be" magnetic field generated by external 
currents j should not be exaggerated into saying that its distribution is independent on the magnetic 
bodies in the system. 52 

In the limit /u » Eqs. (126) yield H int /Ho « 1, B int /Ho = 3// 0 , the factor 3 being specific for 
the particular geometry of the sphere. If a sample is stretched along the applied field, this limitation of 
the field concentration is gradually removed, and B mi tends to its maximum value juHo » B ext , as was 
discussed above. This effect of "magnetic line concentration" in high-// materials is used in such 
practically important devices as transformers, in which two multi-turn coils are wound on a ring-shaped 
(e.g., toroidal, see Fig. 6b) core made of a soft ferromagnetic material (such as the transformer steel, see 
Table 1) with ju » juo. This minimizes the number of "stray" field lines, and makes the magnetic flux O 
piercing each wire turn (of either coil) virtually the same - the equality important for secondary voltage 
induction - see the next chapter. 

The second theoretical tool, frequently useful for problem solution, is a macroscopic expression 
for magnetic field energy U. For a system with linear magnetic materials, we may repeat the 
transformation of Eq. (55), made in Sec. 3, but with due respect to the magnetization, i.e. replacing j not 
from Eq. (56), but from Eq. (107). As a result, instead of Eq. (57) we get 




(5.127) 

ZjU i 

This result is evidently similar to Eq. (3.79) of electrostatics. 



Field 

energy in a 

linear 

magnetic 



51 This similarity may seem strange, because earlier we have seen that parameter /j is physically more similar to 
Vs. The reason for this paradox is that in magnetostatics, the introduced potential </>,„ is traditionally used to 
describe the "would-be field" H, while in electrostatics, potential </> describes the real (average) electric field E. 
(This tradition persists from the old days when H was perceived as a genuine magnetic field.) 

52 From the standpoint of mathematics, this happens because the solution to a boundary problem is determined by 
not only the differential equation inside the system (in our case, the Laplace equation for potential but also by 
boundary conditions - which are affected by magnetics - see Eqs. (1 18)-(1 19). 
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For the general case of nonlinear magnetics, calculations similar to those resulting in Eq. (3.82) 
General 8^ ve me following analog of that relation: 

(5.128) 



magnetic 
energy 
variation 



Su = H -SB . 



for a linear magnetic yielding Eq. (127). Similarly to the electrostatics of dielectrics, we may argue that 
according to Eq. (128), in systems with magnetic media, H plays the role of the generalized force, and B 
of the generalized coordinate (per unit volume). 53 As the result, the Gibbs potential energy, whose 
minimum corresponds to the stable equilibrium of the system in an external field H ext , is 



Gibbs 
potential 
energy 



f = \g{r)d 3 r, with g(r) = M (r)-H e 



B 



(5.129) 



the expression to be compared with Eq. (3.84). Similarly, for a system with linear magnetics, the latter 
of these expressions may be integrated over the variations to give 

g(r) = t— B • B - H ext • B = — — (B - //H ext ) 2 + const , (5.130) 
2ju 2ju 

with similar consequences for the external magnetic field penetration into a system with magnetics. As a 
sanity check, for a uniform system with negligible fringe fields, such as a long solenoid filled with a 
uniform, linear magnetic material, Eq. (130) may be readily integrated over the sample volume to give 

f(r) = — !— (B- //H ext ) 2 V + const , (5.131) 
2// 

so that the minimum of the Gibbs potential energy, i.e. the stable equilibrium of the system, corresponds 
to the result that has already been derived in the beginning of this section: B = juH ext , i.e. H = H ext . 

For the important particular case of a long solenoid (Fig. 6a) filled with a linear magnetic 
material, we hay find field H from Eq. (117), just as we used Eq. (37) in Sec. 2 for finding B for a 
similar empty solenoid, getting 

H = In , and hence B = juln . (5.132) 
Now we may plug this result into Eq. (127) to calculate the magnetic energy stored in the solenoid: 

U = uV = ^lA = ^^-, (5.132) 
2 2 

and then use Eq. (72) to calculate its self-inductance: 

L = — ^ — = /m 2 IA (5.133) 
I 2 /2 



53 Note that in this respect, the analogy with electrostatics is incomplete. Indeed, according to Eq. (3.82), in 
electrostatics the role of a generalized coordinate is played by would-be field D, and that of the generalized force, 
by the real (average) electric field E. This difference may be traced back to the fact that electric field E may 
perform work on a moving charged particle, while the magnetic part of the Lorentz force (10), vxB, is always 
perpendicular to particle's velocity, and its work equals zero. However, this difference does not affect the full 
analogy of expressions (3.79) and (127) for field energy density in linear media. 
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- as evident generalization of Eq. (75). This result explains why filling of solenoids with soft 
ferromagnets with ju » /Jo is so popular in the electrical engineering practice, where large self- and 
mutual inductances are frequently needed in systems with size and/or weight restrictions. 

Now, let us use these two tools to discuss a curious (and practically important) approach to 
systems with ferromagnetic cores. First, let us find the magnetic flux O in a system with a relatively 
thin, closed magnetic core made of sections of (possibly, different) soft ferromagnets, with the cross- 
section areas A\ much smaller than the squared lengths 4 of the sections - see Fig. 15. 




h N 

< > 



]Vk »Vo'i A,. 



C 



Fig. 5.15. Deriving the "magnetic Ohm law" (135). 



If all /Jk » juo, virtually all field lines are confined to the interior of the core. Then, applying the 
macroscopic Ampere law (117) to contour C, which follows a magnetic field line inside the core (see the 
dashed line in Fig. 15), we get the following approximate expression (exactly valid only in the limit 
jUkljuo, lklA k -> go): 

$H,dl ^l k H k = £/, — = NI . (5.134) 

C k k Mk 

However, since the magnetic field lines stay in the core, the magnetic flux Q>k ~ BkAk should be the same 
(= O) for each section, so that Bk = O/Ak. Plugging this condition into Eq. (134), we get 



O = ^ — , where = 



2X ' k VkA 



Magnetic 
nn Ohm law 

p.ijo; and 



reluctance 



Note a close analogy of the first of these equations with the Ohm law for several resistors 
connected in series, with the magnetic flux playing the role of electric current, while the product NI, of 
the voltage applied to the resistor chain. This analogy is fortified by the fact that the second of Eqs. 
(135) is similar to the expression for resistance R = 1/oA of a long uniform conductor, with the magnetic 
permeability ju playing the role of the electric conductivity cr. (In order to sound similar, but still 
different from resistance R, parameter is called the reluctance^) This is why Eq. (135) is called the 
magnetic Ohm law, it is very useful for approximate analyses of systems like ac transformers, magnetic 
energy storage systems, etc. 

The role of the "magnetic e.m.f." NI may be also played by a permanent-magnet section of the 
core. Indeed, for relatively low fields we may use the Taylor expansion of the nonlinear function B(H) 
near H = 0 to write 
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dB 

B*+ju 0 M s +v d H, ju d =—\ H=0 , (5.136) 

uti 

where M s is the spontaneous magnetization magnitude at H = 0, the + sign corresponds to two possible 
directions of the magnetization, and parameter fid is called the differential (or "dynamic") permeability. 
Expressing H from this relation, and using it in one of components of the sum (134), we again get a 
result similar to Eq. (135) 

<D = T (M ^f , with^ S -^_, (5.137) 

k 

where l H and A H are geometric dimensions of the hard-ferromagnet section, and product NI is replaced 
with its effective value 

i NI \{ =+— m Jh- (5-138) 
M d 

This result may be used for a semi-quantitative explanation of the well-known short-range forces 
acting between permanent magnets (or between them and soft ferromagnets) at their mechanical contact 
(Fig. 16). 




Fig. 5.16. Short-range interaction between magnets. 



Indeed, considering the free-space gaps between them as sections of the core (which is 
approximately correct, because due to the small gap thickness d the magnetic field lines cannot stray far 
from the contact area), and neglecting the reluctance ^of the bulk material (due to its larger cross- 
section), we get 



0 oc 



2d I 

— + — 

Mo Md 



Y 



(5.139) 



so that, according to Eq. (127), the magnetic energy of the system (disregarding the constant energy of 
the permanent magnetization) is 



U oc 



2d I 
— + — 

Mo M d 



B 2 oc 



2d I 
— + — 

Mo Md 



1 i 1 Mo i j 

cc , d 0 = -—I « / 

d + d 0 2 ju d 



(5.140) 



Hence the magnet attraction force, 
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F = 



dU 

~dd 



oc 



1 

(d + d 0 ) 



2 ' 



(5.141) 



behaves almost as the divergence lid truncated at a short distance do « I. Due to that truncation, the 
force is finite at d = 0; this exactly the force you need to apply to detach two magnets. 

Finally, let us discuss in brief a related effect in experiments with thin and long hard 
ferromagnetic samples - "needles", like those used in magnetic compasses. Using the definition (108) of 
field H, the Maxwell equation (29) takes the form 



and may be rewritten as 



V-B = — V-(H + M) = 0. 

Mo 



V H = -V M 



(5.142) 



(5.143) 



While this relation is general, it is especially convenient in hard ferromagnets, where M is virtually 
fixed by the saturation, so that the right-hand part of Eq. (143) may be considered as a fixed magnetic 
field source. Now let us consider a thin, long needle made of a hard ferromagnet (Fig. 17a). 



(a) 




M 





Fig. 5.17. (a) "Magnetic charges" at the ends of a thin ferromagnetic needle and (b) the result of its breaking 
into two parts (schematically). 



Inside the needle, M = M s = const, while outside it M = 0, so that the right-hand part of Eq. 
(143) is substantially different from zero only in two small areas at the needle's ends, and on much 
larger distances we can use the following approximation: 

V H = -q m S(r - ri ) + g m S(r-r 2 ), (5.155) 

where ri >2 are ends' positions, and q m = M S A, with A being the needle's cross-section area. This equation 
is completely similar to Eq. (1.27) for the electric field created by two equal and opposite point charges. 

1/2 

In particular, if two ends of two needles are hold at an intermediate distance r (A «r « I, where / is 
the needle length, see Fig. 17b), the ends interact in accordance with the magnetic Coulomb law 

F^\-^. (5.156) 
r r 
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The "only" (but conceptually, a very significant!) difference with electrostatics is that the 
"magnetic charges" ±q m cannot be fully separated. For example, if we break a magnetic needle in the 
middle in at attempt to bring its two ends further apart, two new "charges" appear - see Fig. 17b. There 
are several solid state systems where more flexible structures, close to the magnetic needles, may be 
implemented. First of all, certain ("type-II ") superconductors may sustain so-called Abrikosov vortices - 
crudely, flexible tubes with field-suppressed superconductivity, each carrying one magnetic flux 
quantum O 0 = hlne ~ 2x1 0~ 15 Wb - see Sec. 6.3. Ending on superconductor's surface, these tubes let the 
field lines to spread into the surrounding space, essentially forming a magnetic monopole analog (of 
course, with an equal opposite "monopole" on another end of the line). Such flux tubes are not only 
flexible but readily stretchable, resulting in several peculiar effects. 54 Another, recently found, example 
of paired "monopoles" include spin chains in so-called spin ices - crystals with paramagnetic ions 
arranged into a specific (pyrochlore) lattice - such as dysprosium titanate Dy 2 Ti 2 07. 55 



5.7. Exercise problems 

5.1 . Calculate the magnetic field distribution along the axis of a straight solenoid (Fig. 6a) with a 
finite length /, and round cross-section of radius R. Assume that the solenoid has many wire turns (N» 
1) that are uniformly distributed along its length. 



5.2 . Calculate the (self-) inductance of a toroidal solenoid 
(Fig. 6) with the cross-section shown in Fig. on the right (r ~ R), 
filled with a material of magnetic permeability /u, with many (N 
» 1, Rlr) wire turns uniformly distributed along the perimeter. 
Check your results by analyzing the limit r«R. 

Hint : You may like to use the following table integral: 56 



Jin 



di; = n . 



for a > 1 . 



P 



5.3 . Estimate the values of magnetic susceptibility due to 

(i) orbital diamagnetism, and 

(ii) spin paramagnetism, 

for a dilute medium with negligible interaction between molecular dipoles. 

Hint: For task (i), you may use the classical model described by Eq. (114) (see Fig. 13), while for 
task (ii), assume the mechanism of ordering of spontaneous magnetic dipoles m 0 , similar to the one 
sketched for electric dipoles in Fig. 12b, with the magnitude of the order of the Bohr magneton /j B - see 
Eq. (96). 



54 A detailed discussion of the Abrikosov vortices may be found, for example, in Chapter 5 of M. Tinkham, 
Introduction to Superconductivity, 2 nd ed., McGraw-Hill, 1996. 

55 See, e.g., L. Jaubert and P. Holdworth, J. Phys. - Cond. Matt. 23, 164222 (201 1) and references therein. 

56 See, e.g., MA (6.13). 
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5.4 . A round cylindrical shell, made of a soft ferromagnet, is placed 
into a uniform external field Ho perpendicular to its axis - see Fig. on the 
right. Find the distribution of the magnetic field everywhere in the system, 
and discuss its efficiency as a "magnetic shield". 



5.5 . Calculate the distribution of magnetic field around a sphere made of a hard ferromagnet with 
a permanent, uniform magnetization M = const. 
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Chapter 6. Time-Dependent Electromagnetism 



In this chapter discusses two major new effects that appear if the electric and magnetic fields are 
changing in time: the "electromagnetic induction " of electric field by changing magnetic field, and the 
reciprocal effect of "displacement currents" - the induction of magnetic field by changing electric field. 
These two phenomena, which make the time -dependent electric and magnetic fields inseparable, 
contribute to the system of four Maxwell equations, and make it valid for arbitrary electromagnetic 
processes. On the way, I will pause for a brief review of the electrodynamics of superconductivity, which 
(besides its own significance), provides a perfect platform for a discussion of the gauge invariance. 



As Eqs. (5.36) and (5.109) show, in static situations (d/dt = 0) the Maxwell equations describing 
the electric and magnetic fields are independent, and are coupled only implicitly, via the continuity 
equation (4.5) relating their right-hand parts p and j. (In statics this relation imposes a restriction only on 
vector j.) In dynamics, when the fields change in time, the situation in different. 

Historically, the first discovered explicit coupling between the electric and magnetic fields was 
the effect of electromagnetic induction. 1 The summary of Faraday's numerous experiments has turned 
out to be very simple: if the magnetic flux, defined by Eq. (5.65), 



through a surface S limited by contour C, changes in time by whatever reason (e.g., either due to a 
change of the magnetic field B, or contour's motion, or its deformation), it induces an additional, vortex- 
like electric field Em, similar in its topology to the magnetic field induced by a current. The exact 
distribution of E; n d in space depends on system geometry details and may be rather complex, but its 
integral along the contour C, called the inductive electromotive force (e.m.f), obeys a very simple 
Faraday induction law: 2 




(6.2) 



In is straightforward (and hence left for the reader's exercise :-) to show that the e.m.f. may be 
measured, for example, either inserting a voltmeter into a conducting loop following contour C, or by 
measuring current / = V in d/R it induces in a thin wire with Ohmic resistance R, whose shape follows that 
contour. The minus sign in Eq. (2) corresponds to the so-called Lenz rule: the magnetic field of the 
induced Ohmic current provides a partial compensation of the change of the original O in time. 

In order to recast Eq. (2) in a differential form, let us apply, to the above definition of V ind , the 
same Stokes theorem that was repeatedly used in Chapter 5. 3 The result is 



1 It was discovered independently by J. Henry and M. Faraday, but is was a brilliant experiment series of the latter 
physicist, carried out in 1831, that led to a virtually instant recognition. 

2 In Gaussian units, the right-hand part of this formula has the additional coefficient lie. 

3 If necessary, see MA Eq. (12.1) again. 
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n ld =j(vxE ind \d : 



(6.3) 



Now combining Eqs. (l)-(3), for a contour C whose shape does not change in time (so that the 
integration along it is interchangeable with the time derivative), 4 we get 



VxE ind + 



SB 

~dt 



d 2 r = 0. 



(6.4) 



Since the induced electric field is additional to the field (1.33) created by electric charges, for the 
net field we should write E = E; n d - Vtp . However, since curl of any gradient field is zero, 5 Vx(V^) = 0, 
Eq. (4) is valid for the net field E. Since this equation should be correct for any closed area S, we may 
conclude that 



VxE + 



~dt 



o 



(6.5) 



Differential 
form of the 
Faraday law 



at any point. This is the final (time-dependent) form of this Maxwell equation. Superficially, it may look 
that Eq. (5) is less general than Eq. (2); for example that it does not describe any electric field, and 
hence any e.m.f. in a moving loop, if field B is constant in time, so that flux (1) does change in time. 
However, this is not true; in Chapter 9 we will see that in the reference frame moving with the loop such 
e.m.f. does appear. 

Now let us re-formulate Eq. (5) in terms of the vector-potential. Since the induction effect does 
not alter the fundamental relation V • B = 0, we still may present the magnetic field as prescribed by Eq. 
(5.27), i.e. as B = V x A. Plugging this expression into Eq. (6), we get 

~dt. 



Vx 



E + - 



0. 



(6.6) 



Hence we can use the argumentation of Sec. 1.3 (there applied to vector E alone) to present the 
expression in parentheses as -V^, so that 

PA 

(6.7) 

It is tempting to interpret the first term of the right-hand part as describing the electromagnetic 
induction alone, and the second term representing a purely electric field induced by electric charges. 
However, the separation of these two terms is, to a certain extent, conditional. Indeed, let us consider the 
gauge transformation already mentioned in Sec. 5.2, 




Electric 
field vs. 
potentials 



A — » A + V j , 



(6.8) 



4 Let me admit that from the beginning of the course, I was carefully sweeping under the rug a very important 
question: in what exactly reference frame(s) all the equations of electrodynamics are valid? I promise to discuss 
this issue in detail later in the course (in Chapter 9), and for now would like to get away with a very short answer: 
all the formulas discussed so far are valid any inertial reference frame, as defined in classical kinematics - see, 
e.g., CM Chapter 1. It is crucial, however, to have fields E and B measured in the same reference frame. 

5 See, e.g., MA Eq. (11.1). 
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that, as we already know, does not change the magnetic field. According to Eq. (8), in order to keep the 
full electric field intact (gauge-invariant) as well, the scalar electric potential has to be transformed 
simultaneously, as 

+ -**-^> (6.9) 
ot 

leaving the choice of a time-independent addition to 0 restricted only by the Laplace equation - since 
the full </) should satisfy the Poisson equation (1.41) with a gauge-invariant right-hand part. We will 
return to the discussion of gauge invariance in Sec. 3. 

Now let us discuss whether Eqs. (2) or (5) describing the electromagnetic induction represent 
some completely new facts, on top of all the equations of electrostatics and magnetostatics, discussed in 
previous five chapters. The answer is not. To demonstrate that, let us consider a thin wire loop with 
current / , placed in a magnetic field (Fig. 1). According to Eq. (5.21), the magnetic force exerted by the 
field upon a small fragment of the wire is 

JF = /(JrxB) = -/(BxJr), (6.10) 

where dr is a small vector, tangential to loop's contour and directed along current /. Now let the wire be 
slightly (and slowly) deformed so that this particular fragment is displaced by a small distance Sr. (Let 
me hope that Fig. 1 makes the difference between the elementary vectors dr and Sr absolutely clear.) 




Since the wire's acceleration (if any) is negligibly small, external (non-magnetic) forces should 
balance force (10), i.e. provide an equal and opposite force. This is why the work of these external 
forces at the displacement Sr, i.e. the change of the magnetic field energy U, is, 

S(dU) = -d¥ ■ Sr = ISr ■ (B x dr) . (6.11) 

Let us apply to this mixed product the general operand rotation rule of the vector algebra, 6 so that vector 
B comes out of the vector product: 

S(dU) = IB-(drxSr). (6.12) 

But the magnitude of this vector product is nothing more than the area S(d 2 r) = S(dS) swept by the 
wire's fragment at the deformation (Fig. 1), while its direction is perpendicular to this elementary area 
dS, along the "proper" normal vector n = (drldr)x( SrISr). The scalar multiplication of B by this vector is 



6 See, e.g., MA Eq. (7.6). 
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equivalent to taking its normal component. Hence, integrating Eq. (12) over all the wire length, we get 
the following result for the total variation of the magnetic energy: 

SU = I§B n S(d 2 r). (6.13) 

c 

If B does not change at the wire deformation, the variation sign may be moved out from the integral, and 
Eq. (13) yields 7 

SU=IS®, (6.14) 

where O is the magnetic flux through the loop. 

Now let the work 8^= SU, necessary for this energy change, to come from a generator of 
voltage Vext, inserted somewhere in the loop. In order for the system to be in quasi-equilibrium, this 
voltage should counter-balance the electromagnetic induction's e.m.f. V ind . Work of the voltage at 

transfer of charge SQ = I St, during elementary deformation's duration St, is 

W = V ext SQ = -V mA SQ = -V mA ia . (6.15) 

Comparing Eqs. (14) and (15), we arrive at the Faraday induction law (2). 

Moreover, some authors derive Eq. (2) in this way, implying that there is no new information in 
the induction law at all. Note, however, that the simple derivation given above has used the assumption 
of magnetic field's independence on the deformation. A removal of this limitation would require using 
the Lorentz field transform (which will be only discussed in Chapter 9), and a very careful 
argumentation to exclude a faulty logic loop, because the transform itself is typically derived from 
Maxwell equations - including Eq. (5) that we are trying to prove. Personally I am happy that Dr. 
Faraday did his thorough work so early, placing the electromagnetic induction law on a firm 
experimental basis. 



6.2. Quasistationary approximation and skin effect 

As we will see later in this chapter, the interplay of the electromagnetic induction with one more 
time-dependent effect (the so-called displacement currents), enables electromagnetic waves propagating 

1/2 1/2 

with speed c = l/(sojuo) in free space, and with a comparable speed v = \l{s/u) in dielectric and/or 
magnetic materials. For the phenomena whose spatial scale is much smaller than the wavelength k = 
ItwIg), the displacement current effects are negligible, and time-dependent phenomena may be described 
by using Eq. (6) together with three other macroscopic Maxwell equations in their unmodified form: 8 



VxE + ^ = 0, 

dt 

V-D = p, 



VxH = j ; 
VB = 0. 



Quasi- 
(6.16) stationary 

approximation 



These equations define the so-called quasistationary approximation of electromagnetism, and 
are sufficient to describe many important phenomena. Let us use them first of all for an analysis of the 



7 Actually, Eq. (14) is just an integral version of Eq. (5.128). 

8 Actually, the absence of time-dependent corrections to other Maxwell equations in the quasistationary 
approximation should be considered as an additional experimental fact. 
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so-called skin effect, the phenomenon of self-shielding of the alternating (ac) magnetic fields by currents 
flowing in a conductor. 

In order to form a complete system of equations, Eqs. (16) should be augmented by material 
equations describing the medium. Let us take them, for a conductor, in the simplest (and simultaneously, 
most common) linear and isotropic form: 



j = oE, B = //H 



(6.17) 



If the conductor is uniform, i.e. coefficients cr and ju are constant inside it, the whole system of 
equations (16)-(17) may be reduced to a single equation. Indeed, a sequential substitution of these 
equations into each other yields: 



dB 



= -V x E = - — V x j = -—V x (V x H) = - — V x (V x B) = - — [v(V • B) - V 2 b] 

8t <J <J <JjU (TjU 



(6.18) 



V 2 B. 



Thus we have arrived, without any further assumptions, at a very simple partial differential 
equation. Let us use it to analyze the skin effect in the simplest geometry (Fig. 2a) when an external 
source (which, at this point, does not need to be specified) produces, near a plane surface of a bulk 
conductor, a spatially-uniform ac magnetic field H (0 \t) parallel to the surface. 




Fig. 6.2. (a) Skin effect in the 
simplest, planar geometry, 
and (b) two Ampere contours 
for deriving the "microscopic" 
(contour C\) and the 
"macroscopic" (contour C2) 
boundary conditions for H. 



Selecting the coordinate system as shown in Fig. 2, we may express this condition as 



H 



x=-0 



= H {0 \t)n 



(6.19) 



The translational symmetry of our simple problem within the surface plane \y, z] implies that inside the 
conductor d/dy = d/dz = 0 as well, and H = H(x, t)n y even at x > 0, so that Eq. (18) for conductor's 
interior is reduced to a differential equation for just one scalar function H(x, t) = B(x, i)lju: 9 



dH 



1 d 2 H 
crju dx 2 



for x > 0 



(6.20) 



9 Due to the simple linear relation between fields B and H, it does not matter too much which of them is used for 
the solution of this problem. A slight preference is for H, due to the simplicity of the boundary condition (5.118). 
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This equation may be further simplified by noticing that due to its linearity, we may use the linear 
superposition principle for the time dependence of the field, via expanding it, as well as the external 
field (19), into the Fourier series, 

H(x,t) = ^H m (x)e- i(0t , fovx>0, 

(6.21) 

H { - a \t) = Y J H { ! ) e- im , forx = -0, 

CO 

and arguing that if we know the solution for each frequency component, the whole field may be found 
through the elementary summation (17) of these solutions. For each single-frequency component, Eq. 
(21) is immediately reduced to an ordinary differential equation for the complex amplitude HJx): 

-ida a =—^H m . (6.22) 
aju ax 

From the theory of linear differential equations we know that Eq. (22) has the following general 
solution: 

H a) (x) = H + e Kx +H_e k - X , (6.23) 

where constants k± are roots of the characteristic equation that may be obtained by substitution of any of 
these two exponents into the initial differential equation. For our particular case, the characteristic 
equation, following from Eq. (22), is 

-ico = — (6.24) 

(JjU 

and its roots are complex constants 

k ± =±(-ijuo)a) 1 ' 2 = ±^—j=L(jua>a) 112 . (6.25) 

V2 

For our problem, the field cannot grow exponentially at x — > +co, so that only one of the 
coefficients, namely H. corresponding to the decaying exponent, with Re k < 0 (namely k = k.), may be 
nonvanishing, so that Hj^x) = H&(0)exp{-fcjc} . In order to find the constant factor H&(0), we can 
integrate the Maxwell equation VxH = j along a pre-surface contour - say, contour C\ shown in Fig. 2b. 
The right-hand part's integral is negligible, because j does not contain any "genuinely surface" currents, 

10 

localized at a depth much smaller than l/k.. As a result, we get the "microscopic" boundary condition 
similar to Eq. (5.118) for the stationary magnetic field, H T = const at x = 0, we get 

H(0,t) = H®(t\ i.e.tf„(o)=tf<°>, (6.26) 

so that the final solution of the problem may be presented as 



10 This common name is awkward, because Eq. (26) results from macroscopic Maxwell equations (16), but is 
justified as the counterpart to the "macroscopic" boundary condition (30), to be discussed in a minute. 
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= #i 0) exp -- £■ ex Pi 



S. 



cot 



5 



(6.27a) 



s J 



where constant S s is called the skin depth: 



Skin 
depth 




(6.27b) 



This solution describes the skin effect: the penetration of the ac magnetic field of frequency oo 
into a conductor only to a depth of the order of S s . A couple of examples of the skin depth: for copper at 
room temperature, S s « 1 cm at the ac power distribution frequency of 60 Hz, and is of the order of just 
1 um at a few GHz, i.e. at typical frequencies of cell phone signals and kitchen microwave magnetrons. 
For a modestly salted water, S s is close to 250 m at 1 Hz (with big implications for radio 
communications with submarines), and is of the order of 1 cm at a few GHz (explaining a nonuniform 
heating of a soup bowl in a microwave oven). 

In order to complete the skin effect discussion, let us consider what happens at the ac current and 
the electric field at this effect. When deriving our basic equation (18), we have used, in particular, 
relations j = V x H = /I V x B, and E = j/cr. Since a spatial differentiation of an exponent yield a similar 
exponents, the electric field and current density have the same spatial dependence as the magnetic field, 
i.e. penetrate inside the conductor by distances of the order of S s {eo), but their vectors are directed 
perpendicularly to B, while still being parallel to the conductor surface: 11 



L{x)=k_H a (x)n z , 1Z a {x) = —H a) (x)n i 

G 



(6.28) 



By the way, integrating the first of these relations with the help of Eq. (26a), we may find that 
the linear density J of the surface currents (measured in A/m), is simply and fundamentally related to 
the applied magnetic field: 



(6.29) 



Since this relation does not have frequency-dependent factors, we may sum it up for all frequencies and 
get a universal relation 



j{t)=H(%)n z =H^l-n y xnJ=H(%)x{-n x ) = rt%)xn. 



(6.30) 



where n = -n T is the outer normal to the surface - see Fig. 2b. This simple relation (whose last form is 
independent of the reference frame choice) is not occasional. Indeed, Eq. (30) may be readily obtained 
from the Ampere law (5.37) applied to a contour drawn around a fragment of the surface, but extending 
under it much deeper than the skin depth - see contour C 2 in Fig. 2b, regardless of the exact law of the 
field penetration. Relation (30) is frequently called the "macroscopic" boundary condition for the 
magnetic field near conductor's surface, to distinguish it from the "microscopic" boundary condition 
(26). 



1 1 Notice that vectors j and E are parallel, and have the same time dependence. This means that the time average 
of the power dissipation j • E is finite. We will return to its discussion later in this chapter. 
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For the skin effect, the fundamental relation between the surface current density and the external 
magnetic field means that the effect implementation does not require a dedicated ac magnetic field 
source. For example, it takes place in any wire that carries ac current, and leads to current concentration 
in a surface sheet of thickness ~S S . (Of course the quantitative analysis of this problem in a wire with an 
arbitrary cross-section may be technically complicated, because it requires to solve Eq. (18) for a 2D 
geometry; even for the round cross-section, the solution involves the Bessel functions.) In this case, the 
ac magnetic field outside the conductor, that still obeys Eq. (30), is better understood as the effect, rather 
than the reason, of the ac current flow. 

Finally, the reader should mind the validity limits of these results - besides the universal Eq. 
(30). First, in order for the quasistationary approximation to be valid, frequency co should not be too 
high, so that the skin depth (27) remains much smaller than the corresponding wavelength, 



CO 



( An 1 N 

SjUCO 2 J 



1/2 



(6.31) 



which decreases with co faster than d s (27b). Note that the crossover frequency (at which S s = X), 

co r =- = — , (6.32) 
e £ r s 0 

is nothing else than the reciprocal charge relaxation time (4.10). As was discussed in Sec. 4.2, for good 

18 1 

metals this frequency is extremely high (about 10 s" ). 

A more practical upper limit on co is that the skin depth 8 S should stay much larger than the mean 
free path I of charge carriers. 12 Beyond this point, a non-local relation between vectors j(r) and E(r) 
becomes essential. Both theory and experiment show that at S s < I, the skin effect still persists, but 
acquires a slightly different frequency dependence, d s oc co . Such anomalous skin effect has useful 
applications, for example, for experimental measurements of the Fermi surface in metals. 13 



6.3. Electrodynamics of superconductivity and gauge invariance 

The effect of superconductivity 14 takes place when temperature T is reduced below a certain 
critical temperature (T c ), specific for each material. For most metallic superconductors, T c is of the 
order of typically a few kelvins, though several exotic compounds (the so-called high-temperature 
superconductors) with T c above 100 K have been found since 1987. The most notable property of 
superconductors is the absence, at T < T c , of measurable resistance to not very high dc currents. 

However, electromagnetic properties of superconductors cannot be described by just taking a = 
oo in our previous results. Indeed, for this case, Eq. (27b) would give S s = 0, i.e., no ac magnetic field 
penetration at all, while for the dc field we would have the uncertainty aco = ? Experiment shows 
something substantially different: weak magnetic fields do penetrate into superconductors by a material- 



12 A brief discussion of the mean free path may be found, for example, in SM Chapter 6. In very clean metals at 
low temperatures, S s may approach / at frequencies as low as ~1 GHz, though at room temperature the crossover 
from the normal to the anomalous skin affect takes place at ~ 100 GHz. 

13 See, e.g., A. A. Abrikosov, Introduction to the Theory of Normal Metals, Academic Press, 1972. 

14 Discovered experimentally in 1911 by H. Kamerlingh Onnes. 
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specific London penetration depth b\ ~ 10" 7 -10~ 6 m, 15 which is virtually frequency-independent until the 
skin depth S s , measured in the same material in its "normal" state, i.e. the absence of superconductivity, 
becomes less than b\. (This crossover happens typically at frequencies ~ 10 13 s" 1 .) The smallness of b\ 
means that the magnetic field is pushed out of macroscopic samples at their transition into the 
superconducting state. 

This Meissner-Ochsenfeld effect, discovered experimentally in 193 3, 16 may be partly understood 
using the following classical reasoning. When we discussed the physics of conductivity in Sec. 4.2, we 
implied that the current (and electric field) frequency oo is either zero or sufficiently low. In the classical 
Drude reasoning (see Sec. 4.2), this is acceptable while oor « 1, where r is the effective carrier 
scattering time participating in Eqs. (4.12)-(4.13). If this condition is not satisfied, we should take into 
account the charge carrier inertia; moreover, in the opposite limit oor » 1 we may neglect the scattering 
at all. Classically, we can describe the charge carriers in such a "perfect conductor" as particles that are 
accelerated by the electric field in accordance with the 2 nd Newton law (4.1 1) at all times, 

v = — F = ^E, (6.33) 
m m 

so that the current density j = qm they create changes in time as 

j = ^E. (6.34) 
m 

In terms of the Fourier amplitudes (see the previous section), this means 

-iflj.=^E.. (6.35) 
m 

Comparing this formula with the relation \ m = oE a implied in the last section, we see that we can use all 
its results with the following replacement: 

cr^i- — . (6.36) 
moo 

This change replaces the characteristic equation (24) with 

k 2 moo 

-ioo = — j — > (6.37) 
iq nju 

and hence replaces the skin effect with the field penetration by the following frequency-independent 
depth: 



8 = 



f Y /2 
m 



K m 2 nj 



(6.38) 



Superficially, this means that the field decay into the superconductor does not depend on frequency: 



15 Named to acknowledge the pioneering theoretical work of brothers F. and H. London - see below. 

16 It is hardly fair to shorten the name to just the "Meissner effect", as it is frequently done, because of the 
reportedly crucial contribution made by R. Ochsenfeld, then W. Meissner' s student, into the discovery. 
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H(x,t) = H(0,t)cxp\ k (6.39) 



explaining the Meissner-Ochsenfeld effect. 

However, there are two problems with this result. First, for the parameters typical for good 

29 3 8 2 

metals (q = -e, n ~ 10 m" , m ~ m e , /u ~ juo), Eq. (38) gives 8 ~ 10" m, a factor of 10-10 lower than the 



C5 



typical experimental values of c5L. Experiment also shows that the penetration depth diverges at T — » T t 
which is not predicted by Eq. (38). Another, much more fundamental problem with Eq. (38) is that it has 
been derived for cor » 1. Even is we assume that somehow there are no collisions at all, i.e. r = oo, at 
a> — > 0 both parts of the characteristic equation (37) vanish, and we cannot make any conclusion about k. 
This is not just a mathematical artifact we could ignore. For example, let us place a non-magnetic metal 
at T > T c into a static external magnetic field. The field will completely penetrate into the sample. Now 
let us cool it. As soon as the temperature drops below T c , our calculations become valid, forbidding the 
penetration into the superconductor of any change of the field, so that the initial field would be "frozen" 
inside the sample. The experiment shows something completely different: as T is lowered below T c , the 
initial field is being pushed out of the sample. 

The resolution of these contradictions has been provided by quantum mechanics. As was 
explained in 1957 in a seminal work by J. Bardeen, L. Cooper, and J. Schrieffer (commonly referred to 
the BSC theory), superconductivity is due to the correlated motion of electron pairs, with opposite spins 
and nearly opposite momenta. Such Cooper pairs, each with the electric charge q = -2e and zero spin, 
may form only in a narrow energy layer near the Fermi surface, of certain thickness A(7). Parameter 
A(7), which may be also considered as the binding energy of the pair, tends to zero at T — > T c , while at T 
« T c it has a virtually constant value A(0) « 3.5 ]cbT c , of the order of a few meV for most 
superconductors. This fact readily explains the relatively low spatial density of the Cooper pairs: n p (T) ~ 



nMT)lsf ~ 10 6 m" 3 . With the correction n — » n p , our Eq. (38) for the penetration depth becomes 



S-*S L = 



r \i/2 
m 



(6.40) 



London 

penetration 

depth 



This expression diverges at T — > T c , and generally fits the experimental data reasonably well, at least for 
the so-called "clean" superconductors (with the mean free path / = vrmuch longer that the Cooper pair 
size see below). 

The smallness of the coupling energy A(7) is also a key factor in the explanation of the 
Meissner-Ochsenfeld effect, as well as several macroscopic quantum phenomena in superconductors. 
Because of Heisenberg's quantum uncertainty relation SrSp ~ h, the Cooper-pair size (the so-called 
coherence length) is relatively large: <f ~ 5r ~ hi dp ~ #vf/A(7) ~ 10" m. As a result, n p ^ » 1, meaning 
that Cooper pairs are strongly overlapped in space. Now, due to their integer spin, Cooper pairs behave 
like bosons, which means in particular that at low temperature they exhibit the so-called Bose-Einstein 
condensation onto the same energy level. 17 This means that the frequency co= Elh of the time evolution 



17 A qualitative discussion of the Bose-Einstein condensation of bosons may be found in SM Sec. 3.4, though the 
full theory of superconductivity is more complex, because it describes the condensation taking place 
simultaneously with the formation of effective bosons (Cooper pairs). For a more detailed coverage of physics of 
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of each pair's wavefunction *P = i//exp {-icot} is the same, i.e. that the phases q> of the wavefunctions, 
defined by equation 

y/ = \y/\e i(p , (6.41) 

become equal, so that the current is carried not by individual Cooper pairs but rather their Bose-Einstein 
condensate described by a single wavefunction. Due to this coherence, the quantum effects (which are, 
in usual Fermi-liquids of single electrons, masked by the statistical spread of phases <p), become very 
explicit - "macroscopic". 

To illustrate this, let us write the well-known quantum-mechanical formula for the probability 
current of a free, non-re lativistic particle, 18 

h=^ v ^- c - c )=^[v / *(- m >- c - c -l ( 6 - 42 ) 

Now let me borrow one result that will be proved later in the course (in Sec. 9.7) when we discuss the 
analytical mechanics of a charged particle moving in an electromagnetic field. Namely, in order to 
account for the magnetic field effects, particle's kinetic momentum p, equal to my (where v = dxldt is 
particle's velocity) has to be distinguished from its canonical momentum, 19 

P^p + ^A. (6.43) 

where A is the vector-potential of the field - see Eq. (5.27). In contrast with Cartesian components pj = 
muj of momentum p, the canonical momentum components are the generalized momenta corresponding 
to components rj of the radius-vector r, considered as generalized coordinates of the particle: Pj = 
d/Jdvj, where / is the particle's Lagrangian function. According to the general rules of transfer from 

classical to quantum mechanics, 20 it is vector P whose operator (in the Schrodinger picture) equals -zW, 
so that the operator of kinetic momentum p = P - qA is equal to -ihV - qA. Hence, the in order to 
account for the magnetic field effects, we should make the following replacement: 

- ihV -> -zW - qA . (6.44) 
In particular, Eq. (42) has to be replaced with 

\ p =^y*{-m-qA)is-c.c]. (6.45) 

This expression becomes more transparent if we take the wavefunction in form (41): 



h i ,2 

i P = —\¥ 



m 



Vp-^-A 

h 



(6.46) 



superconductors, the reader may be referred, for example, to the already cited monograph by M. Tinkham, 
Introduction to Superconductivity, 2 nd ed., McGraw-Hill, 1996. 

18 See, e.g., QM Sec. 1.4, in particular Eq. (1.47). 

19 I am sorry to use traditional notations p and P for the momenta - the same symbols which were used for the 
electric dipole moment and polarization in Chapter 3. I hope there will be no confusion, because the latter notions 
are not used in this section. 

20 See, e.g., CM Sec. 10.1, in particular Eq. (10.26). 
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This relation means, in particular, that in order to keep j invariant, the gauge transformation (8)-(9) has 
to be accompanied by a simultaneous transformation of the wavefunction phase: 



<P 



(p+-x- 

n 



(6.47) 



It is fascinating that the quantum-mechanical wavefunction (more exactly, its phase) is not gauge- 
invariant - meaning that you may change it in your mind - at will! Again, this does not change any 
observable (such as j or the probability density y/y/*), i.e. any experimental results. 



For the electric current density of the whole superconducting condensate, Eq. (46) yields 




(6.48) 



This equation shows that this supercurrent may be induced by dc magnetic field alone and does not 
require any electric field. Indeed, for the simplest, ID geometry shown in Fig. 2a, j(r) =j(x) n z , A(r) = 
A(x) n,, and dldz = 0, so that the Coulomb gauge condition (5.48) is satisfied for any choice of the gauge 
function %(x), and for the sake of simplicity we can choose it to provide qir) = const, 21 so that 



Supercurrent 
density 



J = " 



q n 



P (T) 



m 



(6.49) 



This is the so-called London equation, proposed (in a different form) by brothers F. and H. 
London in 1935 for a phenomenological description of the Meissner-Ochsenfeld effect. Combining it 
with Eq. (5.47), generalized for an arbitrary uniform media by the replacement // 0 — > //, we get 



V 2 A 



fjq n 



P (T) 



m 



(6.50) 



This simple differential equation, similar in structure to Eq. (18), has a similar exponential solution, 



A(*) = A(0)expj-— j 



5(*) = fl(0)expj-— j 



;(*) = 7(0)expj-— j, (6 .51) 



that shows that the magnetic field and supercurrent penetrate into a superconductor only by the 
London's penetration depth b\, given by Eq. (40), regardless of frequency. 22 By the way, integrating the 
last result through the penetration layer, and using Eqs. (34), (43) and the vector-potential definition, B 
= VxA (for our geometry, giving B(x) = dA{x)ldx = -SiA(x)) we may check that the linear density J of 
the surface supercurrent still satisfies the universal relation (30). 

Let me hope that the physical intuition of the reader enables him or her to make the following 
semi-quantitative generalization of the quantitative solution (51) to superconductor sample of arbitrary 



21 This is the so-called London gauge which, for our geometry, is also the Coulomb gauge. 

22 Since not all electrons of a superconductor form Cooper pairs, at any frequency at^O they provide Joule losses 
which are not described by Eq. (48). These losses become very substantial when frequency co becomes so high 
that the skin-effect length S s of the material (as measured with superconductivity suppressed, say by high 
magnetic field) becomes less than d\. For typical metallic superconductors, this happens at frequencies of a few 
hundred GHz, so that even for microwaves, Eq. (51) gives a fairly good description of the field penetration. 
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shape: B and j may only penetrate into the sample by distances of the order of cSL(O). In particular, for 
samples much larger than o\, the London theory gives the following "macroscopic" description of the 
Meissner-Ochsenfeld effect: j = 0 and B = 0 everywhere inside a superconductor. In this coarse 
description, the bulk superconductor sample behaves as an ideal diamagnet, with ju = 0. 23 In particular, 
we can use this analogy and the first of Eqs. (5.125) to immediately obtain the magnetic field 
distribution outside a superconducting sphere: 



B = J u o H = - J u Q V0 m , </> m =H ( 



- r 



cos0. (6.52) 



Figure 3 shows the corresponding surfaces of equal potential <j) m . It is evident that the magnetic 
field lines (normal to the equipotential surfaces) bend to become parallel to the superconductor's 
surface. By the way, this pattern illustrates the answer to the question that might arise at making 
assumption (19): what happens to superconductors in a normal magnetic field? The answer is: the field 
is deformed outside the superconductor to provide B n = 0 at the surface - otherwise, due to the continuity 
of B n , the magnetic field would penetrate the superconductor, which is impossible. Of course this answer 
should be taken with a grain of salt: strong magnetic fields do penetrate into superconductors, destroying 
superconductivity (completely or partly), thus violating the Meissner-Ochsenfeld effect. Such a 
penetration by itself features several interesting electrodynamic effects, for whose discussion we 
unfortunately do not have time. 24 




Fig. 6.3. Surfaces of constant scalar 
potential tj) m of magnetic field 
around a superconducting sphere of 
radius R » cSL, placed into a weak 
uniform, vertical magnetic field. 



6.4. Electrodynamics of macroscopic quantum phenomena 

We have seen that for the ac magnetic field penetration, the quantum theory of superconductivity 
gives essentially the same result as the classical theory of a perfect conductor - cf. Eqs. (39) and (51) - 
with the "only" conceptual exception that the former theory extends the effect to dc fields. However, the 
quantum theory of superconductors is much more rich. For example, let us use Eq. (48) to derive the 



23 Of course, this analogy sweeps under the rug the real physics of the Meissner-Ochsenfeld effect. In particular, 
in superconductors the role of the surface "magnetization currents" with effective density j ef = VxM (see Fig. 5.11 
and its discussion) is played by the real, persistent surface supercurrents (48). 

24 The interested reader may be referred, e.g., to Chapter 5 of M. Tinkham's monograph cited above. 
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Fig. 6.4. (a) Closed, flux-quantizing superconducting ring, (b) a ring cut with a narrow slit, 
and (c) a Superconducting QUantum Interference Device (SQUID). 



From the last section's analysis, we know that deep inside the wire the supercurrent is 
exponentially small. Integrating Eq. (48) along any closed contour C that does not approach the surface 
closer than a few c\ at any point, we get 



§Vcp-dr-?-§A-dr = 0, 



(6.53) 



The first integral, i.e. the difference of q> in the initial and final points, has to be equal to either zero or 
an integer number of 2n, because the change (p — > cp + lnn does not change condensate's wavefunction: 

y,' = \ ¥ \M 2m } = \ ¥ \e i<p = y. (6.54) 

On the other hand, the second integral in Eq. (53) is just the magnetic flux cp (1) through the contour - 
and, due to the Meissner-Ochsenfeld effect, through the superconducting ring as a whole. As a result, we 
get 



O = nO 



0 ' 



_ 27th h n , „ , „ 

O 0 = = -, n = 0,±l,±2,. 



(6.55) 



Magnetic 
flux 

quantization 



i.e. the magnetic flux can only take values multiple of the flux quantum Oo. This effect, predicted in 
1950 by the same Fritz London (who expected q to be equal to the electron charge -e), was confirmed 
experimentally in 1961, 25 with \q\ = 2e (so that in superconductors Oo = h/2e ~ 2.07xl0" 15 Wb). 
Historically, this observation gave a decisive support to the BSC theory of the Cooper pairs as the basis 
of superconductivity, which had been put forward just 4 years before. 26 



25 Independently and virtually simultaneously by two groups: B. Deaver and W. Fairbank, and R. Doll and M. 
Nabauer, so that their reports were published back-to-back in Phys. Rev. Lett. 

26 Actually, the ring is not entirely necessary. In 1957, A. Abricosov used the Ginsburg-Landau equations (see 
below) to explain the counter-intuitive behavior of the so-called type-II superconductors, known experimentally 
as the Shubnikov phase since the 1930s. He showed that high magnetic field may penetrate into such 
superconductors, whose coherence length £ is smaller than the London's penetration depth Sl(T), in the form of 
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Josephson 
supercurrent 



The flux quantization is just one of the so-called macroscopic quantum effects in 
superconductivity. Consider, for example, a superconducting ring interrupted with a very narrow slit 
(Fig. 4b). Integrating Eq. (48) along the current-free path from point 1 to point 2, along the dashed line 
in Fig. 4 (again, deeper than b\{T) from the surface), we get 



0 




h 



■ dr = cp 2 - q> x 



h 



Using the flux quantum definition (55), this result may be rewritten as 



Josephson 
phase 
difference 




(6.56) 



(6.57) 



where g> is called the Josephson phase difference. In contrast to each of the phases g)\^, their difference 
(p is gauge-invariant, because it is directly related to the gauge-invariant magnetic flux. 

Can this g> be measured? Yes, using the Josephson effect? 1 In order to understand his prediction, 
let us take two (for the argument simplicity, similar) superconductors, connected with some sort of weak 
link, for example a tunnel barrier or a short normal-metal bridge, through that a small Cooper pair 
current can flow. (Such system of two coupled superconductors is now called a Josephson junction?) Let 
us think what this supercurrent I may be a function of. For that, the reverse thinking is helpful: let us 
imagine we can change current from outside; what parameter of the superconducting condensate can it 
affect? 

If the current is weak, it cannot perturb the superconducting condensate density, proportional to 
\\j?\ ; hence it may only change the Cooper condensate phases (p\z. However, according to Eq. (41), the 
phases are not gauge-invariant, while the current should be, hence I may affect - or should be a function 
of - the phase difference q> defined by Eq. (57). Moreover, just has already been argued during the flux 
quantization discussion, a change of any of q)\^ (and hence of (p) by 2n or any of its multiples should not 
change the current. In addition, if the wavefunction is the same in both superconductors (cp = 0), 
supercurrent should vanish due to the system symmetry. Hence function I((p) should satisfy conditions 

7(0) =0, 1(g) + In) = 1(g)). (6.58) 

With this understanding, we should not be terribly surprised by the following Josephson' s result that for 
the weak link provided by weak tunneling, 28 



1(g)) = I C sing). 



(6.59) 



where constant I c , which depends on of the strength of the weak link and temperature, is called the 
critical current. 



self-formed tubes surrounded by vortex-shaped supercurrents - the so-called Abrikosov vortices, with the 
superconductivity suppressed near the middle of each tube. This suppression makes each flux tube topologically 
equivalent to a superconducting ring, with the magnetic flux through it equal to one flux quantum, and its ends 
being magnetically similar to monopoles - see Sec. 5.6 above. 

27 It was predicted in 1961 by B. Josephson (then a PhD student!), and observed experimentally by several groups 
soon after that. 

28 For some other types of weak links, function 1(g)) may deviate from the sine form (59) rather considerably, still 
satisfying the general requirements (58). 
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Let me show how such expression may be derived, for a narrow and short weak link made of a 
normal metal or a superconductor. 29 Microscopic theory of superconductivity shows that, within certain 
limits, the Bose-Einstein condensate of Cooper pairs may be described by the following nonlinear 
Schrddinger equation 30 

— (- zW - qA) 2 y/ + U(r)y/ = sys + yr x (a nonlinear function of \y/\ 2 ). (6.60) 
2m 

The first three terms of this equation are similar to those of the usual Schrodinger equation (which 
conserves the number of particles), while the nonlinear function in the last term describes the formation 
and dissolution of Cooper pairs, and in particular gives the equilibrium value of n s as a function of 
temperature. Now let the weak link size scale a be much smaller than both the Cooper pair size E, and the 
London's penetration depth d\. The first of these relations {a « makes the first term in Eq. (60), that 
scales as 1/a 2 , much larger than all other terms, while the latter relation (a « d\) allows one to neglect 
magnetic field effects, and hence drop term (-qA) from the parenthesis in Eq. (60), reducing it to just our 
familiar Laplace equation for the wavefunction: 

V> = 0. (6.61) 

Since the weak coupling cannot change \y/\ in bulk superconducting electrodes, Eq. (61) may be solved 
with the following simple boundary conditions: 



y/\e l(Pl , forr— >r 1? 



y/\e l(Pl , forr— >r 2 , 



(6.62) 



where i*i and r2 are some points well inside the corresponding superconductors, i.e. at distances much 
larger than a from the weak link center. It is straightforward to verify that the solution of this boundary 
problem for complex function y/ may be expressed as follows, 

¥ {v) = \ W \e^ f(r) + \ W \e^ (l - /(r)) , (6.63) 

via the real function/(r) that satisfies the Laplace equation and the following boundary conditions: 

fl, forr— >r, , 
[0, tor r — > r 2 . 

Function /(r) depends on the weak link geometry and may be rather complicated, but we do not 
need to know it to get the most important result. Indeed, plugging this solution into Eq. (48) (with term - 
qA ignored as being negligibly small), we get 



29 This derivation belongs to L. Aslamazov and A. Larkin, JETP Lett. 9, 87 (1969). If the reader is not interested 
in this topic, he or she may safely skip it, jumping directly to the text following Eq. (68). 

30 At T — > T c , where n s — > 0, the Taylor expansion of the nonlinear function in Eq. (60) may be limited to just one 
term proportional to | y>\ 2 cc n s . In this limit, Eq. (60) is called the Ginsburg-Landau equation. Derived by V. 
Ginsburg and L. Landau in 1950 from phenomenological arguments (see, e.g., SM Sec. 4.3) , i.e. before the 
advent of the BSC theory, this simple equation, solved together with Eq. (48) and the Maxwell equations, may 
describe a very broad range of macroscopic quantum effects including the Abrikosov vortices, critical fields and 
currents, etc. - see, e.g., M. Tinkham' monograph cited above. 
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h , ,2 hq n P (T) 

j = \ys\ V/sin^, so that j = V/sin^. (6.65) 

m m 

Integrating this relation over any cross-section S of the weak link, we arrive at Josephson's result (59), 
with the following critical current: 

/= _%^) J(V/UV (666) 

m s 

This expression may be readily evaluated via the resistance of the same weak link in the 
"normal" (non-superconducting) state, say at T > T c . Indeed, as we know from Sec. 4.3, the distribution 
of the electrostatic potential <fi at normal conduction also obeys the Laplace equation, with boundary 
conditions that may be taken in the form 

I V fr>r r — ^ r 

(6.67) 

Comparing the boundary problem for <fi(r) with that for function /(r), we get <j)=Vf. This means that the 
gradient V/, which participates in Eq. (66), is just (-E/V) = (-j/crV). Hence the integral in that formula is 
just -II oV = -\/oR n , where R n is the resistance of the Josephson junction in its normal state. As a result, 
Eq. (66) yields 

hqn p {T) 1 




ma R„ 



(6.68) 



showing that the I c R n product does not depend on the junction geometry, though it does depend on 
temperature, vanishing, together with n p (T), at T — > T c . (Well below the critical temperature, I c R n of a 
sufficiently short weak links is of the order of A(0)/e, i.e. of the order of a few mV.) 

Now let us see what happens if a Josephson junction is placed into the gap in a superconductor 
ring - see Fig. 4c. In this case, we can combine Eqs. (57) and (59), getting 



Macroscopic 
quantum 
interference 



/ = / sin 2n — 

On 



(6.69) 



This effect of periodic dependence of the current on flux is called the macroscopic quantum 
interference? 1 while the system shown in Fig. 4b, a superconducting quantum interference device, 
abbreviated as SQUID (with all letters capital, please :-). The low value of the magnetic flux quantum 
Oo, and hence the high sensitivity of <p to the magnetic field, allows using SQUIDs as ultrasensitive 
magnetometers. Indeed, for a superconducting ring of area ~1 cm 2 , one period of the change of 
supercurrent (69) is produced by magnetic filed change of the order of 10" 11 T (10~ 7 Gs), while sensitive 
electronics allows to measure a tiny fraction of this period - limited by thermal noise at a level of the 
order of a few pT. This sensitivity allows measurements, for example, of the magnetic fields induced by 
the beating human heart, and even by brain activity, outside of the body. 



31 The name is due to the deep analogy between this phenomenon and the interference between two waves, to be 
discussed in detail in Sec. 8.4. 
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An important aspect of the quantum interference is the so-called Aharonov-Bohm (AB) effect? 2 
Let the magnetic field lines be limited to the central part of the SQUID ring, so that no appreciable 
magnetic field ever touches the superconducting ring material. (This may be done experimentally with 
very good accuracy, for example using high-// magnetic cores - see their discussion in Sec. 5.6.) As 
predicted by Eq. (69), and confirmed by several careful experiments carried out in the mid-1960s, 33 this 
restriction does not matter - the interference is observed anyway. This means that not only the magnetic 
field B, but also the vector-potential A represents physical reality, albeit quite a peculiar one - 
remember the gauge transformation? 

Actually, the magnetic flux quantization (55) and the macroscopic quantum interference (69) are 
not completely different effects, but just two manifestations of the whole group of inter-related 
macroscopic quantum phenomena. In order to show that, one should note that if critical current I c (or 
rather its product by loop's self-inductance L) is high enough, flux O in the SQUID loop is due not only 
to the external magnetic field flux cp e , but also has a self-field component - cf. Eq. (5.61): 34 

0 = O ext - LI, where O ext = J(5 ext )„ d 2 r . (6.70) 

s 

Now the relation between © and O ex t may be found by solving this equation together with Eq. (69). 
Figure 5 shows this relation for several values of the dimensionless parameter Pl = 27tLI c /Oq. 




Fig. 6.5. Function 0(O ext ) for SQUIDs 
with various values of the LI C product. 
Dashed arrows show the flux leaps as the 
external field is changed. (The branches 
with d<£>/d<S ext < 0 are unstable.) 



32 For a more detailed discussion of the AB effect, which also takes place for single quantum particles, see, e.g., 
QM Sec. 3.2. 

33 Later, similar experiments were carried out with electron beams, and then even with "normal" (meaning non- 
superconducting) solid-state conducting rings. In this case, the effect is due to interference of the wavefunction of 
a single charged particle (an electron) with itself, and if of course is much harder to observe that in SQUIDs. In 
particular, the ring size has to be very small, and temperature low, to avoid "dephasing" effects due to 
unavoidable interactions of the particles with environment. 

34 The sign before LI would be positive, as in Eq. (5.61), if / was the current flowing into the inductance. 
However, in order to keep the sign in Eq. (69) intact, / should mean the current flowing into the Josephson 
junction, i.e. from the inductance, thus changing the sign of the term. 
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These plots show that if the critical current or (or the inductance) is low, J3l« 1 , the self-effects 
are negligible, and the total flux follows the external field (i.e., O ex t) quite faithfully. However, at J3l> 1, 
the dependence 0(O ex t) becomes hysteretic, and at (3 L » 1 the positive-slope (stable) branches of this 
function are nearly flat, with the total flux values corresponding to Eq. (55). Thus, a superconducting 
ring closed by a high-/ c Josephson junction exhibits a nearly-perfect flux quantization. 

The self-field effects described by Eq. (70) create certain technical problems for SQUID 
magnetometry, but they are the basis for one more application of these devices: ultrafast computing. 
Indeed, Fig. 5 shows that at the values of f3 L modestly above 1 (e.g., J3 L « 3) , within a certain range of 
applied field the SQUID has two stable flux states that differ by AO « Oo and may be used for coding 
binary 0 and 1 . For practical superconductors (like Nb), the time of switching between these states (see 
dashed arrows in Fig. 4) are of the order of a picosecond, while the energy dissipated at such event may 
be as low as ~10" 19 J. (This bound is determined not by device's physics, by the fundamental 
requirement for the energy barrier between the two states to be much higher than the thermal fluctuation 
energy scale k^T, ensuring a sufficiently long information retention time.) While the picosecond 
switching speed may be also achieved with some semiconductor devices, the power consumption of the 
SQUID-based digital devices may be 5 to 6 orders of magnitude lower, enabling VLSI circuits with 
100-GHz-scale clock frequencies and manageable power dissipation. Unfortunately, the range of 
practical application of these Rapid Single-Flux-Quantum (RSFQ) logic circuits is still narrow, due to 
the inconvenience of their deep refrigeration to temperatures below T c . 35 

Since we have already got the basic relations (57) and (59) describing the macroscopic quantum 
phenomena in superconductivity, let me mention in brief two other members of this group, called the 
Josephson effects. Differentiating Eq. (57) over time, and using the Faraday induction law (2), we get 36 



Josephson 
phase-to- 
voltage 
relation 



dip 




dt 


ft 



(6.71) 



This famous phase-to-voltage relation should be valid regardless of the way how voltage V has been 
created, 37 so let us apply Eqs. (59) and (71) to the simplest circuit with a non-superconducting source of 
dc voltage - see Fig. 6. 







<p 2 









Fig. 6.6. DC-voltage-biased Josephson junction. 



V 



35 For more on that technology, see the review paper by P. Bunyk et al, Int. J. High Speed Electron. Syst. 11, 257 
(2001) and references therein. 

36 Since the induced e.m.f. 0f nd cannot drop on the superconducting path between the Josephson junction 
electrodes 1 and 2 (Fig. 3), it should equal to (-V), where Vis the voltage across the junction. 

37 It may be also obtained from simple Schrodinger equation arguments - see, e.g., QM Sec. 2.2. 
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If current / is below the critical value, 



-I C <K+I t 



(6.72) 



Eq. (59) allows phase cp to have a time-independent value 

q> = arcsin(/// e ) , 



(6.73) 



and hence, according to Eq. (71), a vanishing voltage drop across the junction: V = 0. This dc Josephson 
effect is not quite surprising - indeed, we have postulated from the very beginning that the Josephson 
junction may pass a certain supercurrent. Much more fascinating is the so-called ac Josephson effect that 
takes place if voltage across the junction has a nonvanishing average (dc) component Vo- For simplicity, 
let us assume that this is the only voltage component: V(t) = Vo = const, 38 then Eq. (71) may be readily 
integrated to give q> = a>jt + (po, where 



with the Josephson frequency oj (74), which is proportional to the applied dc voltage. For practicable 
voltages, frequency /j = &>j/2;r corresponds to the GHz or even THz ranges, because the proportionality 
coefficient in Eq. (74) is very high:/j/Vo = 2e//z « 483 MHz/uV. 39 An important experimental fact is the 
universality of this coefficient. For example, in the mid-1980s, the group led by Prof. J. Lukens of our 
department proved that this factor is material-independent with the relative accuracy of at least 10" 15 . 
Very few experiments, especially in solid state physics, have ever reached such precision. 

This fundamental nature of the Josephson voltage-to-frequency relation (74) allows an important 
application of the ac Josephson effect in metrology. Namely, phase locking the Josephson oscillations 
with an external microwave signal derived from an atomic frequency standards one can get the most 
precise dc voltage than from any other source. In NIST and other metrological institutions around the 
globe, this effect is used for the calibration of simpler "secondary" voltage standards that can operate at 
room temperature. 40 



Let a wire coil (meaning either a single loop illustrated in Fig. 5.4b, or a series of such loops, 
such as one of the solenoids shown in Fig. 5.6) have size a that satisfies, at frequencies of our interest, 
the quasistationary limit condition a « A. Moreover, let the coil's self- inductance L be much larger than 
that of the wires connecting it to other components of our system: ac voltage sources, voltmeters, etc. 
(Since, according to Eq. (5.75), (5.1 13), L scales as the number N of wire turns squared, this is easier to 



38 In experiment, this condition is hard to implement, due to relatively high inductance of the current leads 
providing dc voltage supply. However, these complications do not change the main conclusion of the analysis. 

39 This 1962 prediction by B. Josephson was confirmed experimentally - implicitly (by phase locking of the 
oscillations with an external oscillator) in 1963, and explicitly (by the detection of microwave radiation) in 1967. 

40 For more on the Josephson effect and other macroscopic quantum phenomena in superconductivity, see, e.g., 
Chapters 6 and 7 in the monograph by M. Tinkham, which was cited above. 




(6.74) 



Josephson 

oscillation 

frequency 



This result, plugged into Eq. (59), shows that supercurrent oscillates, 

I((p) = I c sm(cD 3 t + <p Q ), 



(6.75) 



6.5. Inductors, transformers, and ac Kirchhoff laws 



Chapter 6 



Page 20 of 30 



Essential Graduate Physics 



EM: Classical Electrodynamics 



achieve at iV » 1 .) Then in a system consisting of such lumped induction coils and external wires (and 
other circuit elements such as resistors, capacitances, etc.), we may neglect the electromagnetic 
induction effects everywhere outside the coil, so that the electric field in those external regions is 
potential. Then the voltage V between coil's terminals may be defined (as in electrostatics) as the 
difference of values of scalar potential 0 between the terminals, i.e. as integral 



Voltage 
drop on 
inductance 
coil 



V 



j"E-Jr 



(6.76) 



between the coil terminals along any path outside the coil. This voltage has to be balanced by the 
induction e.m.f. (2) in the coil, so that if the Ohmic resistance of the coil is negligible, 41 we may write 



V = 



JO 
dt 



(6.77) 



where O is the magnetic flux in the coil. If the flux is due to the current / in the same coil only (i.e. if it 
is magnetically uncoupled from other coils), we may use Eq. (5.70) to get the well-known relation 

(6.78) 

where the compliance with the Lenz sign rule is achieved by selecting the relations between the assumed 
voltage polarity and current direction as shown in Fig. 7a. 




V 



(a) 




o(0 















\ 





(c) 



Fig. 6.7. (a) Induction coil, (b) 
two inductively coupled coils, 
and (c) an ac transformer. 



If similar conditions are satisfied for two magnetically coupled coils (Fig. 6b), then, in Eq. (77), 
we need to use Eqs. (5.69) instead, getting 



V, 



T dl 1 dl 2 
Lj — - + M — - 
dt dt 



T dl 2 dl 1 
L 2 — - + M — L 
dt dt 



(6.79) 



where the repeating index is dropped for notation simplicity. Such systems of inductively coupled coils 
have numerous applications in electrical engineering and physical experiment. 42 Probably the most 
important is the ac transformer (Fig. 6c) where both coils share a common soft-ferromagnetic core. As 
we already know, such material (with ju » ju 0 ) tries to not let any magnetic field lines out, so that the 
magnetic flux O(f) in the core is nearly the same in each of its cross-sections. This gives 



JO JO 

V^N,— , V 2 *N 2 — . 
dt dt 



(6.80) 



41 If the resistance is substantial, it may be represented, in calculations, by a separate lumped circuit element 
(resistor) connected in series with the coil. 

42 Starting from the pioneering experiments by M. Faraday - who invented this device. 
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where M,2 is tne number of wire turns in each coil, so that the voltage ratio is completely determined by 
N1/N2 ratio. 

Now we may generalize, to the ac current case, the notion of an electric circuit, already 
discussed in Chapter 4 - see Fig. 4.3 reproduced in Fig. 8a below. Let not only wire inductances but also 
wire capacitances be negligible in comparison with those of compact (lumped) capacitances. Then we 
may present the circuit as the connection of lumped circuit elements with ideal (voltage- and charge-free 
wires), with the list of its circuit elements now including not only resistors and current sources (as in the 
dc case), but also induction coils (including magnetically coupled ones) and capacitors - see Fig. 8b. 




In the quasistationary limit, current through each wire is conserved, so that the "node rule", i.e. 
the 1 st Kirchhoff law (4.7), " 

£/, =0. (6.81) 

i 

remains valid. Also, if the electromagnetic induction effect is restricted to the interiors of lumped 
induction coils as discussed above, voltage drops V* across each circuit element may be still presented, 
just as in dc circuits, as differences of potentials of the adjacent nodes, so that the "loop rule", i.e. 2 nd 
Kirchhoff law given by Eq. (4.8), 

5X=0. (6.82) 

k 

is also valid. 



In contrast to the dc case, Eqs. (81) and (82) are now the (ordinary) differential equations. 
However, if all circuit elements are linear (as in the examples presented in Fig. 8b), these equations may 
be readily reduced to linear algebraic equations using the Fourier expansion. (In the most common case 
of sinusoidal ac sources, the final stage of Fourier series summation is unnecessary.) I do not have time 
to discuss even the simplest examples of such circuits, such as LC, LR, RC, and LRC loops and periodic 
structures, 43 but my experience shows that the potential readers of these notes are well familiar with 



43 Interestingly, these effects include the wave propagation in periodic LC circuits, despite still staying within the 
quasistationary approximation! However, within this approximation, speed \I(LC) X 12 of these waves is much lower 
than speed l/(€ju) V2 of electromagnetic waves in the surrounding medium - see the next chapter. 
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these problems from their undergraduate studies. Let me only emphasize again that the standard ac 
circuit theory is only valid within the quasistationary limit a « A, and only under the condition of the 
electric and magnetic field confinement inside lumped circuit elements. 



6.6. Displacement currents 

The electromagnetic induction is not the only new effect arising in non-stationary 
electrodynamics. Indeed, though Eqs. (16) are adequate for the description of quasistationary 
phenomena, a deeper analysis shows that one of these equations, namely V x H = j, cannot be exact. To 
see that, let us take the divergence of its both sides of this equation: 

V-(VxH) = V-j. (6.83) 

But, as the divergence of any curl, 44 the left hand part should equal zero. Hence we get 

V-j = 0. (6.84) 

This is fine in statics, but in dynamics this equation forbids any charge accumulation, because according 
to the continuity relation (4.5), 

V-i = -|. (6.85) 

8t 

This discrepancy had been recognized by James Clerk Maxwell who suggested, in 1864, a way 
out of this contradiction. If we generalize the equation for V x H by adding to term j (that describes real 
currents) the so-called displacement current term, 

Displacement <3J) 

current's } d = — , (6.86) 

density ot 

(that of course vanishes in statics), then the equation takes the form 

VxH = j + L =j + ^. (6.87) 

ot 

In this case, due to equation VD = p, the divergence of the right hand part equals zero due to the 
continuity equation, and the discrepancy is removed. 

This conclusion, and equation (87), are so important that it is worthwhile to have one more look 
at its derivation using a particular "electrical engineering" model shown in Fig. 8 45 Neglecting the 
fringe field effects, we may use Eq. (4.1) to describe the relation between current / flowing through a 
wire and the electric charge Q of the capacitor: 46 



44 Again, see MA Eq. (1 1.2) if you need. 

45 No physicist should be ashamed of doing this. J. C. Maxwell himself has arrived at his equations with a heavy 
use of mechanical engineering arguments. (His main book, A Treatise of Electricity and Magnetism, is full of 
drawings of gears and levers.) More generally, the whole history of science teaches us that snobbishness toward 
engineering and other "not-a-real-physics" disciplines is a sure way toward producing nothing of either practical 
value or fundamental importance. In real science, any method leading to novel, correct results should be welcome. 

46 This is of course just the integral form of the continuity equation (85). 
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(6.88) 



Now let us consider a closed contour C drawn around the wire. (Solid points in Fig. 9 show the places 
where the contour intercepts the plane of drawing.) This contour may be seen as either the line limiting 
surface Si (crossed by the wire) or the line limiting surface 52 (avoiding such crossing by passing 
through capacitor's gap). Applying the macroscopic Ampere law (5.117) to the former surface, we get 

§H-dr= \j n d 2 r = I, (6.89) 

c s x 

while for the latter surface the same law gives a different result, 

|H • dr = J j n d 2 r =0 , [WRONG!] (6.90) 

C S 2 

for the same integral. This is just an integral-form manifestation of the discrepancy outlined above, but it 
shows clearly how serious the problem is (or rather it was - before Maxwell). 
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Fig. 6.9. The Ampere law applied to a recharged capacitor. 



Now let us see how the introduction of the displacement currents saves the day, considering for 
the sake of simplicity a plane capacitor of area A, with a constant electrode spacing. In this case, as we 
already know, the field inside it is uniform, with D = a, so that the total capacitor's charge Q = Aa = 
AD, and current (88) may represented as 

, = ^ = A^. (6.91) 
dt dt 



So, instead of Eq. (90), the modified Ampere law gives 

dt dt 



§H-dr =\(j d ) n d 2 r = \^d 2 r = ^~A = I, (6.92) 

c s 2 s 2 ' * 



i.e. the Ampere integral becomes independent of the choice of the (imaginary!) surface limited by 
contour C - as it should. 



6.7. Finally, full Maxwell equations 
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This is a very special moment in the course: with the displacement current introduction, we have 
finally arrived at the full set of macroscopic Maxwell equations for time-dependent fields, 47 



Macroscopic 
Maxwell 
equations 



VxE + ^ = 0, 

dt 

V-D = A 



VxH-f = j, 

dt 

VB = 0, 



(6.93a) 
(6.93b) 



whose validity has been confirmed in by an enormous body of experimental data. 48 The most striking 
feature of these equations is that, even in the absence of (local) charges and currents, when all the 
equations become homogeneous, 



VxE = 



SB 

~dt' 



VxH = 



3D 



dt 



V • D = 0, 



VB = 0, 



(6.94a) 
(6.94b) 



they still describe something very non-trivial: electromagnetic waves, including light. 49 Indeed, one can 
interpret Eqs. (94a) in the following way: the change of magnetic field creates, via the Faraday induction 
effect, a vortex (divergence-free) electric field, while the dynamics of the electric field, in turn, creates a 
vortex magnetic field via the Maxwell's displacement currents. 

We will carry out a detailed quantitative analysis of the waves in the next chapter, but it is easy 
(and very instructive) to use the Maxwell equations to estimate their velocity v and the field amplitude 
ratio EIH in a medium with D = sE, B = //H, and j = 0. Indeed, let the solution of these equations, in a 
uniform, linear medium have a time period T, and hence the wavelength X = vT. Then the magnitude of 
the left-hand part of the first of Eqs. (94a) is of the order of El A. ~ ElvT, while that of its right-hand part 
may be estimated as BIT = juHIT. Using similar estimates for the second of Eqs. (94a), we arrive at the 
following two requirements for the EIH ratio: 50 

juv . (6.95) 



E 
H 



sv 



In order to insure the compatibility of these two relations, the wave speed should satisfy the estimate 

1 



M 1/2 



(6.96) 



1/2 

reduced to v ~ ll(sojUo) = c in free space, while the ratio of the electric and magnetic field amplitudes 
should be of the following order: 



47 This vector form of the equations, magnificent it its symmetry and simplicity, was developed in 1884-85 by O. 
Heaviside, with substantial contributions by H. Lorentz. (The original Maxwell's result, circa 1861, looked like a 
system of 20 equations for Cartesian components of the vector and scalar potentials.) 

48 Despite numerous efforts, no other corrections (e.g., additional terms) to Maxwell equations have been ever 
found, and these equations are still considered exact within the range of their validity, i.e. while the electric and 
magnetic fields may be considered classically. Moreover, even in quantum case, these equations are believed to 
be strictly valid as relations between the Heisenberg operators of the electric and magnetic field. 

49 Let me emphasize that this is only possible due to the "displacement current" term dD/dt. 

50 The fact that T cancels shows (or rather hints) that these estimates are valid for waves of arbitrary frequency. 
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E_ 
H 



1/2 



E 



(6.97) 



In the next chapter we will see that these are indeed the exact results for a plane electromagnetic wave. 

Now let me fulfill the promise given in Sec. 2 and establish the validity limits of the 
quasistationary approximation (16). For that, let the spatial scale of our system be a, generally unrelated 
to wavelength X = vT, and carry real currents j producing certain magnetic field H. Then, according to 
Eqs. (94a), this magnetic field Faraday-induces electric field E ~ /uHalT , whose displacement currents, 
in turn, produce an additional magnetic field with magnitude 



H' 



as 


as jua 
j H ~ 


f aX ^ 


2 




— E ~ 




H = 




T 


T T 


yvTX j 




K X) 



H 



(6.98) 



Hence, at a « X, the displacement current effect is indeed negligible. 

Before going after the analysis of the full Maxwell equations in particular situations (that will be 
the main goal of all the next chapters of this course), let us have a look at the energy balance they yield 
for a certain volume V - that may include both charged particles and the electromagnetic field. Since, 
according to Eq. (5.10), the magnetic field does no work on charged particles even if they move, the 
total power V being transferred from the field to the particles inside the volume is due to the electric 
field alone: 



-P 



d 3 r, 



r = j " E > 



(6.99) 



where I have used Eq. (4.38). Expressing j from the corresponding Maxwell equation of system (93), 
and plugging it into Eq. (99), we get 



E-(VxH)-E 



dD 



d 3 r. 



(6.100) 



Let us pause here for a second, and transform the divergence of vector ExH using the well- 
known vector algebra identity: 51 

V-(ExH) = H-(VxE)-E-(VxH). (6.101) 

The last term in the right-hand part of this equation is exactly the first term in the square brackets of Eq. 
(100), so that we can rewrite that formula as 



v 



-V-(ExH) + H-(VxE)-E- 



dD 



d 3 r. 



(6.102) 



However, according to the Maxwell equation for V x E, it is equal to - dB/dt, so that the second term in 
the square brackets of Eq. (102) equals -H-dB/dt and, according to Eq. (5.128), is just the (minus) time 
derivative of the magnetic energy per unit volume. Similarly, according to Eq. (3.82), the third term 
under the integral is the minus time derivative of the electric energy per unit volume. Finally, we can use 
the divergence theorem to transform the integral of the first term to a 2D integral over the surface S 



51 See, e.g., MA Eq. (1 1.7) with f = E and g = H. 
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Poynting 
theorem 



Electro- 
magnetic 
energy 
density 



Poynting 
vector 



limiting volume V. As the result, we get the so-called Poynting theorem 52 for the power balance in the 
system: 



(6.103) 



where u is the density of the total (electric plus magnetic) energy of the electromagnetic field, with 




Su = E SO + H SB . 



and the Poynting vector S is defined as 53 



SsExH 



(6.104) 



(6.105) 



The first integral in Eq. (103) is evidently the net change of the energy of the system (particles + 
field) in unit time, so that the second (surface) integral is certainly the power flowing out from the 
system through the surface, and it is tempting to interpret the Poynting vector S locally, as the power 
flow density at the given point. 54 In many cases such a local interpretation of vector S is legitimate; 
however, in some cases it may lead to wrong conclusions. Indeed, let us consider a simple system shown 
in Fig. 10: a planar capacitor placed into a static and uniform external magnetic field so that the electric 
and magnetic fields are mutually perpendicular. In this static situation, no charges are moving, both p 
and d/dt equal to zero, and there should be no power flow in the system. However, Eq. (105) shows that 
the Poynting vector is not equal to zero inside the capacitor, being directed as shown in Fig. 10. 




From the point of view of our only unambiguous corollary of the Maxwell equations, Eq. (103), 
there is no contradiction here, because the fluxes of vector S through the walls of any volume V, for 
example the side walls of the volume shown with dashed lines in Fig. 10, are equal and opposite (and 
they are zero for other faces of this rectilinear volume), so that the total flux of the Poynting vector 
equals zero, as it should. It is, however, useful to recall this example each time before giving the local 
interpretation to vector S. 

Finally, to complete the initial discussion of the Maxwell equations, 55 let us rewrite them in 
terms of potentials A and <fi, because this is more convenient for the solution of some (though not all!) 



52 Called after J. Poynting, though this fact was independently discovered by O. Heaviside, while a similar 
expression for the intensity of mechanical elastic waves had been derived earlier by N. Umov. 

53 Actually, an addition to S of the curl of an arbitrary vector function f(r, t) does not change Eq. (103). Indeed, 
we may use the divergence theorem to transform the corresponding change of the surface integral in Eq. (103) to a 
volume integral of scalar function V-(Vxf) that equals zero at any point - see, e.g., MA Eq. (1 1.2). 

54 Later in the course we will show that the Poynting vector is also directly related to the density of momentum of 
the electromagnetic field. 

55 We will return to their general discussion (in particular, to the analytical mechanics of the electromagnetic 
field, and its stress tensor) in Sec. 9.8, after we have got equipped with the special relativity theory. 
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problems. Even when dealing with a more general system (93) of Maxwell equations than before, Eqs. 
(7) and (5.27), 



dA 

E = -V^ , B = VxA, 

dt 



Electro- 



(6.106) ™s netic 
potentials 



are still used as potential definitions. It is straightforward to verify that with these definitions, two 
homogeneous Maxwell equations (93b) are satisfied automatically. Plugging Eqs. (106) into the 
inhomogeneous equations (93a), and considering, for simplicity, a linear, uniform medium with 
frequency-independent s and ju, we get 



W + |(V-A): 



P_ 

8 



V 2 A-s/j, 



8 2 a 



V • A + s/j, 



80 

dt 



(6.107) 



This is a more complex result than what we would like to get. However, let us select a special 
gauge that is frequently called (especially for the free space case, when v = c) the Lorenz gauge 
condition 56 



V-A + ^ = 0, 

dt 



Lorenz 



(6.108) gauge 

condition 



which is a natural generalization of the Coulomb gauge (5.48) for time-dependent phenomena. With this 
condition, Eqs. (107) are reduced to a simpler, beautifully symmetric form: 57 



vV- 


i aV 

v 2 dt 1 


_p_ 

5 

8 


V 2 A- 


i a 2 A 

v 2 dt 1 


= -/4 



If, mcrv Potential 
^O.l \Jy) dynamics 



where v =\l8ju. 



58 



56 This condition, named after L. Lorenz, should not be confused with the Lorentz invariance condition of the 
relativity theory, due to H. Lorentz (note the names' spelling) - see Sec. 9.4. 

57 Note that Eqs. ( 1 09) are essentially a set of 4 similar equations for 4 scalar functions (namely, <f> and three 
Cartesian components of vector A) and thus clearly invite the 4-component vector formalism of the relativity 
theory - which will be discussed in Chapter 9. 

58 Here I have to mention in passing the so-called Hertz vector potentials U e and Tl m (whose introduction may be 
traced to at least the 1904 work by E. Whittaker). They may be defined by the following relations: 

1 

A = M —^+ M vn m , 0 = —v-n e , 

dt 8 

which make the Lorenz gauge condition (108) automatically satisfied. These potentials are especially convenient 
for the solution of problems in which the electromagnetic field is excited by external sources characterized by 
externally fixed electric and magnetic polarizations P ex t and M ext - rather than fixed charge and current densities p 
and j. Indeed, it is straightforward to check that both Yl e and Yl m satisfy equations similar to Eqs. (109), but with 
the right-hand parts equal to, respectively, -P ex t and -M ext . 
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If <j) and A depend on just one spatial coordinate, say z, in a region without field sources: p = 0, j 
= 0, Eqs. (109) are reduced to the well-known ID wave equations 



d 2 z v 2 dt 2 
d 2 A l d 2 A 



= 0, 



(6.110) 



d 2 z 



= 0 



describing waves propagating with velocity v. Note that due to the definitions of constants sq and juo, in 
free space v is just the speed of light: 



1 



v = 



ta)A)) 1/2 



= c 



(6.110) 



Historically, the experimental observation of relatively low-frequency (GHz-scale) electromagnetic waves 
and the proof that their speed in free space is equal to that of light, was the decisive proof of Maxwell's 
theory. 59 A detailed study of this most important physical phenomenon is the main goal of the next 
chapters of this course. 



6.8 Exercise problems 

6.1 . Prove that the electromagnetic induction e.m.f. V mA in a 
conducting loop may be measured: 

(i) by measuring the current / = V ini /R induced in the closed loop with 

Ohmic resistance R, or 

(ii) using a voltmeter inserted into the loop - see Fig. on the right. 




V =? 



V = ? 



6.2 . Magnetic flux O that pierces a plane, round, uniform, 
resistive ring is being changed in time, while the magnetic field outside 
of the ring is negligibly low. A voltmeter is connected to a part the ring 
as shown in Fig. on the right. What would the voltmeter show? 



6.3 . Use the electromagnetic induction law (5) to derive Eq. (5.128) for the magnetic field 
energy variation. 



6.4 . AC current of frequency a> is being passed through a long uniform wire with a round cross- 
section of radius R that is comparable with the skin depth S s . In the quasi-stationary approximation, find 
the current density distribution across the wire. Analyze the limits R « S s and R » S s . 



59 This was first accomplished in 1886 by H. Hertz, using specially designed electronic circuits and antennas. 
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6.5 . A small, planar loop made of a thin wire, carrying current /, is located far from a plane 
surface of a superconductor. Within the "macroscopic" description of superconductivity (B = 0), find: 

(i) the energy of the loop-superconductor interaction, 

(ii) the force and torque acting on the loop, 

(iii) the distribution of supercurrents on the superconductor surface. 

6.6 . Use the London equation to analyze the penetration of external magnetic field into a thin (t ~ 
<SL), planar superconductor film whose plane is parallel to the field. 

6.7 . Use the London equation to find the distribution of supercurrent density j across the circular 
cross-section (with radius R ~ Sl) of a long, straight superconducting wire that carries dc current /. 

6.8 . The Meissner-Ochsenfeld effect is used, in 
particular, to reduce self-inductance of superconducting wiring. 
Use the London equation to calculate the inductance (per unit 
length) of a long, uniform superconducting strip placed close to 
the surface of a similar superconductor - see Fig. on the right, 
which shows the structure's cross-section. 

Hint: Start from thinking how is the supercurrent distributed along the surfaces of the strip and 
the bulk superconductor. 

6.9 . Use Eqs. (59) and (71) to calculate the energy of a Josephson junction, and the full energy of 
the SQUID shown in Fig. 4c. 



t ~ 5, 



w» d 
< > 



§d»S L 
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Chapter 7. Electromagnetic Wave Propagation 

This chapter focuses on the most important effect that follows from time-dependent Maxwell equations, 
the electromagnetic waves, at this stage avoiding a discussion of their origin, i.e. radiation. I start from 
the simplest, plane waves in a uniform and isotropic media. The next step is a discussion non-uniform 
systems, in particular those with sharp boundaries between different materials, which bring in such new 
effects as wave reflection and refraction. Then I will proceed to the structure of electromagnetic waves 
propagating along various long, cylindrical structures, called transmission lines - such as coaxial 
cables, waveguides, and optical fibers. 



7.1. Plane waves 



Let us start from considering a spatial region that does not contain field sources (p = 0, j = 0), 
and is filled with a linear, uniform, isotropic medium, which obeys Eqs. (3.38) and (5.1 10): 

D = sE, B = //H. (7.1) 

Moreover, let us assume for a minute that these material equations hold for all frequencies of interest. 
As was already shown in Sec. 6.7, in this case the Lorenz gauge condition (6.108) allows the Maxwell 
equations to be recast into wave equations (6.110) for the vector and scalar potentials. However, for 
most our purposes it is more convenient to use directly the homogeneous Maxwell equations (6.94) for 
the electric and magnetic fields - which are independent of the gauge choice. After the elementary 
elimination of D and B using Eq. (1), 1 these equations take a simple, symmetric form 



Electro- 
magnetic 
wave 
equations 



EM wave 
velocity 



Maxwell 
equations 
for uniform 
linear 
media 



VxE + // = 0, VxH-£ — = 0, 

8t dt 



V-E = 0, 



V-H = 0. 



(7.2a) 
(7.2b) 



Now, taking the curl (Vx) of each of Eqs. (2a), and using the vector algebra identity (5.31), whose first 
term, for both E and H, vanishes due to Eqs. (2b), we get similar wave equations for the electric and 
magnetic fields: 



(7.3) 



where parameter v is defined by relation 



2 2 

withv =1/sqjuq = c in free space. 





(7.4) 



1 Though B rather then H is the actual (microscopically-averaged) magnetic field, it is mathematically more 
convenient (just as in Sec. 6.2) to use the latter vector in our current discussion, because at sharp media 
boundaries, H obeys the boundary condition (5.1 18) similar to that for E - see Eq. (3.47). 
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Vector equations (3) are of course are a set of 6 similar equations for three Cartesian components 
of two vectors E and H. Each of these equations allows, in particular, the following solution, 



f = f(z-vt), 



(7.5) 



Plane 
wave 



where z is the Cartesian coordinate along a certain (arbitrary) direction n. This solution describes a 
specific type of a wave, i.e. a certain field pattern moving, without deformation, along axis z, with 
velocity v. According to Eq. (5), each variable / has the same value in each plane perpendicular to the 
direction n of wave propagation, hence the name -plane wave. 

According to Eqs. (2), the independence of the wave equations (3) for vectors E and H does not 
mean that their plane -wave solutions are independent. Indeed, plugging solution (5) into Eqs. (2a), we 
get 



it nxE 


i.e. E = ZH xn , 


H " Z ' 







Relation 
,q r\ between 
{'■°) the fields 



where constant Z is defined as 



z=* = 


V 


1/2 


H 







(7.7) 



Wave 
impedance 



The vector relation (6) means, first of all, that vectors E and H are perpendicular not only to 
vector n (such waves are called transverse), but also to each other (Fig. 1) - at any point of space and at 
any time instant. 



n " 




Fig. 7.1. Field vectors in a plane electromagnetic 
wave propagating along direction n. 



Second, the field magnitudes are related by constant Z, called the wave impedance of the 
medium. Very soon we will see that the wave impedance plays a pivotal role in many problems, in 
particular at the wave reflection from the interface between two media. Since the dimensionality of E, in 
SI units, is V/m, and that of H is A/m, Eq. (7) shows that Z has the dimensionality of V/A, i.e. ohms 
(Q). 2 In particular, in free space, 





f \ 
Mo 


1/2 


z = z 0 = 


= 4;rxl(T 7 c*377n. 


V £ o J 





(7.8) 



Wave 
impedance 
of free 
space 



Now plugging Eq. (6) into Eq. (6.105) for the Poynting vector, we get: 



2 In Gaussian units, E and //have the same dimensionality (in particular, in a free-space wave, E = H), making the 
(very useful) notion of the wave impedance less manifestly exposed - and in some textbooks not mentioned at all. 

3 Please note that the analogy between the wave relations Z = EIH and S = E 2 /Z, on one hand, and the Ohm-law 
relations R = VII and 'P = V 2 IR, on the other hand, may be somewhat misleading. In an Ohmic resistor, power 1° 
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Wave's 
power 
per unit 
area 



S=ExH=n 



= nZH^ 



(7.9) 



In the view of the Poynting vector paradox discussed in Sec. 6.7 (see Fig. 6.10), one may wonder 
whether this expression may be interpreted as the actual density of power flow. In contrast to the static 
situation shown in Fig. 6.7, that limits the electric and magnetic fields to a vicinity of their sources, 
waves may travel far from them. As a result, they can form wave packets of finite length in free space - 
see Fig. 2. 



n 



-s *o- 



V 



'S = 0 



V wave packet 



Fig. 7.2. Interpreting the Poynting vector 
in an electromagnetic wave. 



Let us apply the Poynting theorem (6.103) to the cylinder shown by dashed lines in Fig. 2, with 
one lid inside the wave packet, and another lid in the region already passed by the wave. Then, 
according to Eq. (6.103), the rate of change of the full energy 3 inside the volume is d£ldt = -SA (where 
A is the lid area), so that S may be indeed interpreted as the power flow (per unit area) from the volume. 
Making a reasonable assumption that the finite length of a sufficiently long wave packet does not affect 
the physics inside it, we may indeed interpret the S given by Eq. (9) as the power flow density inside a 
plane electromagnetic wave. 

As we will see later in this chapter, the free-space value Z 0 of the wave impedance, given by Eq. 
(8), establishes the scale of wave impedances of virtually all wave transmission lines, so we may use is 
and Eq. (9) to get some sense of how different are the electric and magnetic field amplitudes in the 
waves, on the scale of typical electrostatics and magnetostatics experiments. For example, according to 
Eqs. (9), a wave of a modest intensity S = 1 W/m 2 (the power density we get from a usual electric bulb a 
few meters away from it) has E ~ (SZo) l/2 ~ 20 V/m, quite comparable with the dc field created by an 

1/2 

AA battery right outside it. On the other hand, the wave's magnetic field H = (S/Zo) « 0.05 A/m. For 
this particular case, the relation following from Eqs. (1), (4), and (7), 

B = ^H = = n - =(s/j) V2 E = — , (7.10) 
Z [ju/s) v 

gives B = jUqH = Elc ~ 7xlO" 8 T, i.e. a magnetic field thousand times less than the Earth field, and about 8 
orders of magnitude lower than the field of a typical permanent magnet. A possible interpretation of this 
huge difference is that the scale of magnetic fields B ~ Elc in the waves is "normal" for 



is the rate of the electric energy loss, i.e. transfer to heat, while in the wave, power P= SA (where A is wave's 
cross-section area) is the rate of the electromagnetic energy transfer through the medium rather than its loss. 
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electromagnetism, while that of permanent magnet fields is abnormally high, because they are due to the 
ferromagnetic alignment of electron spins, essentially quantum objects - see the discussion in Sec. 5.5. 

As soon as a and ju are simple constants, wave speed v is also constant, and Eq. (5) is valid for 
an arbitrary function/ - defined by either initial or boundary conditions. In plain English, a medium 
with frequency-independent s and ju supports propagation of plane waves with an arbitrary waveform 
without either decay {attenuation) or deformation (dispersion). However, for any real medium but pure 
vacuum, this approximation is valid only within limited frequency intervals. We will discuss the effects 
of attenuation and dispersion in the next section and see that all our prior results remain valid even in 
that general case, provided that we limit them to single-frequency (i.e. sinusoidal, or monochromatic) 
waves. Such waves may be most conveniently presented as 4 

(7.11) 

where fa, is the complex amplitude of the wave, and k is its wave number (the magnitude of wave vector 
k = nk), sometimes also called the spatial frequency. The last term is justified by the fact, evident from 
Eq. (11), that k is related to the wavelength X exactly as the usual ("temporal") frequency co is related to 
the time period T: 




lit 


In 


k = T' 


CO = — . 




T 



Requiring Eq. (11) to be a particular form of Eq. (5), i.e. the argument (kz - cot) = k[z 
proportional to (z - vt), so that co/k = v, we see that the wave number should equal 




E r =Re 



E e 

CO X 



i\kz - cot 



Re 



E e 

coy 



i\kz - cot) 



(7.14) 



Mono- 
chromatic 
wave 



(7.12) 



(co/k)t] to be 



(7.13) 

showing that in this "dispersion-free" case the dispersion relation co(k) is linear. 

Now note that Eq. (6) does not claim mean vectors E and H retain their direction in space. (The 
simple case when they do is called the linear polarization of the wave.) Indeed, nothing in the Maxwell 
equations prevents, for example, joint rotation of this pair of vectors around the fixed vector n, while 
still keeping all these three vectors perpendicular to each other at all times. An arbitrary rotation law, or 
even an arbitrary constant frequency of such rotation, however, would violate the single-frequency 
(monochromatic) character of the elementary sinusoidal wave (11). In order to understand what is the 
most general type of polarization the wave may have without violating that condition, let us present two 
Cartesian components of one of these vectors (say, E) along any two fixed axes x and y, perpendicular to 
each other and axis z (i.e. vector n), in the same form as used in Eq. (11): 



Spatial and 

temporal 

frequencies 



Dispersion 
relation 



In order to keep the wave monochromatic, complex amplitudes E m and E^ must be constant; however, 
they may have different magnitudes and an arbitrary phase shift between them. 



4 Due to the linearity of Eqs (2), operator Re in Eq. (11) may be ignored until the end of almost any calculation. 
Because of that, the exponential presentation of monochromatic variables is more convenient than manipulation 
with sine and cosine functions. (See also CM Sec. 4.1.) 
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In the simplest case when the arguments of the complex amplitudes are equal, 

E =\E \e icp . (7.15) 

cox,y | a>x,y\ V' -i -V 

the real field components have the same phase: 

E x,y = \ E a>x,y | C0S ( fe " + <P) > ( 7 - 16 ) 

so that their ratio is constant in time - see Fig. 3a. This means that the wave is linearly polarized, within 
the plane defined by relation 

\e I 

tan 0 = \-^\. (7.17) 
\E \ 




Another simple case is when the moduli of the complex amplitudes E ax and E my are equal, but 
their phases are shifted by +nl2 or -nil: 

i<P v -\ E \ e i ( <p±7r/2 ) 



E = E \e 

cox I ^ 1 



coy 



(7.18) 



In this case 



E x = \E o) J cos(&z -cat + <p), E y = \Ej\ cos kz - cot + cp + — = +\E m | sm{kz -cot + cp). (7.19) 

This means that on the [x, y] plane, the end of vector E moves, with wave's frequency co, either 
clockwise or counterclockwise around a circle - see Fig. 3b: 



0(t) = +(cot - cp) . 



(7.20) 



Such waves are called circularly-polarized. 5 These particular solutions of the Maxwell equations 
are very convenient for quantum electrodynamics, because single electromagnetic field quanta with a 



5 The wave is called right-polarized if the field vector rotates clockwise for the observer facing the oncoming 
wave, and left-polarized in the opposite case. Another popular term for these cases is the "waves of negative/ 
positive helicity". 



Chapter 7 



Page 5 of 64 



Essential Graduate Physics 



EM: Classical Electrodynamics 



certain (positive or negative) spin direction may be considered as elementary excitations of the 
corresponding circularly-polarized wave. (This fact does not exclude, from the quantization scheme, 
waves of other polarizations, because any monochromatic wave may be presented as a linear 
combination of two circularly-polarized waves with opposite helicities, just as Eqs. (14) present it as a 
linear combination of two linearly-polarized waves.) 

Finally, in the general case of arbitrary complex amplitudes E ax and E my , the electric field vector 
end moves along an ellipse on the [x, y] plane (Fig. 3c), such wave is called elliptically polarized. The 
eccentricity and orientation of the ellipse are completely described by one complex number, the ratio 
EJEgy, i.e. two real numbers: \E mx IE ay \ and <p = arg(E C0X /E wy ). 

The same information may be expressed via four so-called Stokes parameters sq, s\, S2, s 3 , which 
are popular in optics because they may be used for the description of not only completely coherent 
waves that are discussed here, but also of party coherent or even fully incoherent waves - including the 
natural light emitted by thermal sources like our Sun. In contrast to the notion of coherent waves whose 
complex amplitudes are considered deterministic numbers, the instant amplitudes of incoherent waves 
should be treated as stochastic variables. 6 

7.2. Attenuation and dispersion 

Now let me show that any linear, isotropic medium may be characterized, by complex, 
frequency-dependent electric permittivity s(co) and magnetic permeability /u(co). Indeed, starting from 
electric effects, the electric polarization of realistic elementary dipoles of the medium cannot follow the 
applied electric field instantly, if the field frequency a>is comparable with those of the internal processes 
- say, transitions between atomic energy levels. Let us consider the most general law of time evolution 
of polarization P(f) for the case of arbitrary applied electric field E{t), 1 but for a sufficiently dilute 
medium, so that the local electric field E e f (3.63), acting on each elementary dipole, is essentially the 
microscopically-averaged field E. 8 Then, due to the linear superposition principle, P(t) should be a 
linear sum (integral) of the values of E(t') at all previous moments of time, t' < t, weighed by some 
function of t and t ': 

Temporal 
(7.21) Green's 
function 



The condition t' < t, which is implied by this relation, expresses a key principle of physics, the 
causal relation between a cause (in our case, the electric field applied to each dipole) and its effect (the 



6 For further reading about the Stokes parameters, as well as about many optics topics I will not have time to 
cover (especially the geometrical optics and the diffraction-imposed limits on optical imaging resolution), I can 
recommend the classical text by M. Born et ah, Principles of Optics, 7 th ed., Cambridge U. Press, 1999. 

7 In an isotropic media, vectors E, P, and hence D = £qE + P, are all parallel, and for the notation simplicity I will 
drop the vector sign. I am also assuming that P at any point r is only dependent on the electric field at the same 
point, and hence drop term ikz from the exponent's argument. This assumption is valid if wavelength X is much 
larger than the elementary media dipole size a. In most systems of interest, the scale of a is atomic (~10" 10 m), so 
that the last approximation is valid up to very high frequencies, co~ cla ~ 10 18 s" , corresponding to hard X-rays. 

8 Note that this condition (which excludes, in particular, the molecular-field effects discussed in Sec. 3.5) is not 
mentioned in most E&M textbooks. If the molecular fields are important, Eq. (21) and its corollaries are only 
valid for the relation between P and the effective local electric field E ef . 



t 

P(t)= | E(t')G(t,t')dt' . 

—oo 
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polarization it creates). Function G(t, t') is called the temporal Green's function for the electric 
polarization. 9 In order to understand its physical sense, let us consider the case when the applied field 
E{t) is a very short pulse at t = to, that may be approximated with the Dirac's delta- function: 

E(t) = S(t-t 0 ). (7.22) 

Then Eq. (21) yields just P(t) = G(t, to), showing that the Green's function is just the polarization at 
moment t, created by a unit ^-functional pulse of the applied field at moment t' (Fig. 4). Thus, the 
temporal G is the exact time analog of the spatial Green's functions G(r, r') we have already studied in 
the electrostatics - see Sec. 2.7. 



E(t) 
Pit) 

0 



E(t) = S(t-t 0 ) 

P(t) = G(t,t 0 ) 




Fig. 7.4. Temporal Green's function for 
electric polarization (schematically). 



What are the general properties of the temporal Green's function? First, the function is evidently 
real, since the dipole moment p and hence polarization P = np are real by the definition - see Eq. (3.6). 
Next, for systems without infinite internal memory, G should tend to zero at t - t' — » x>, although the 
type of this approach (e.g., whether function G oscillates approaching zero) depends on the medium. 
Finally, if parameters of the medium do not change in time, the polarization response to an electric field 
pulse should depend not on its absolute timing, but only on the time difference 6 = t - t' between the 
pulse and observation instants: 



Pit) = \E(f)G(t - f)df = \E(t- e)G(0)de , 



(7.23) 



For a sinusoidal waveform, E(f) = Re [E^' 1 ], this equation yields 



CO 1 00 

P(t) = Rz\E a e~ ica{t ~ 0) G{G)de = Re E m \G(9)e ico0 d6 



-i cot 



(7.24) 



The expression in square brackets is of course nothing more that the complex amplitude P m of the 
polarization. This means that though even if the static relation (3.35) P = % e £oE is invalid for an arbitrary 
time-dependent process, we may still keep its Fourier analog, 



4 00 

P m = with xXco) = -\G{e)e ico9 de . 

e o 0 



(7.25) 



9 A discussion of the temporal Green's functions in application to classical oscillations may be also found in CM 
Sec. 4.1. 
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for each sinusoidal component of the process, using it as the definition of the frequency-dependent 
electric susceptibility % e {co)- Similarly, the frequency-dependent electric permittivity may be defined 
using the Fourier analog of Eq. (3.38): 



(7.26) 



Complex 

Then, according to Eq. (3.36), the permittivity is related to the temporal Green's function by the usual electric 

.' ~ permittivity 

Fourier transtorm: 



s(g>) = s 0 + = s 0 + f G(0)e imd d0 

TP J 



(7.27) 



It is evident from this expression that s(co) may be complex, 

00 00 

£(o)) = £'(co) + is"(co), £'{co) = £ 0 + \ G(0) cos oo0 d0, £"{oo) = \G{0)smoo0 d0, (7.28) 



and that its real part s'(a>) is always an even function of frequency, while the imaginary part s"(cd) is an 
odd function of co. 

Absolutely similar arguments show that the linear magnetic properties may be characterized with 
complex, frequency-dependent permeability //(&>). Now rewriting Eqs. (1) for the complex amplitudes of 
the fields at a particular frequency, we may repeat all calculations of Sec. 1, and verify that all its results 
are valid for monochromatic waves even for a dispersive (but necessarily linear!) medium. In particular, 
Eqs. (7) and (13) now become 



Z{co) = 



/ \ 1/2 

£{co) 



, k(co) = m\e(m)fj((D)\ x 



(7.28) 



so that the wave impedance and wave number may be both complex functions of frequency. 

This fact has important consequences for the electromagnetic wave propagation. First, plugging 
the presentation of the complex wave number as the sum of its real and imaginary parts, k(co) = k'(co) + 
ik"(co), into Eq. (11): 



/ = Re 



e i[k(a))z-at] 1 e -k"(eo)z R J r ^\k'(co)z-mi\ 



(7.29) 



we see that k" (co) describes the rate of wave attenuation in the medium at frequency co. 10 Second, if the 
waveform is not sinusoidal (and hence should be presented as a sum of several/many sinusoidal 
components), the frequency dependence of k'(co) provides for wave dispersion, i.e. the waveform 
deformation at the propagation, because the propagation velocity (4) of component waves is now 
different. 11 



10 It may be tempting to attribute this effect to wave absorption, i.e. the dissipation of the wave's energy, but we 
will see very soon that wave attenuation may be also due to effects different from absorption. 

1 1 The reader is probably familiar with the most noticeable effect of the dispersion, namely the difference between 
that group velocity v gr = dco Idk ', giving the speed of the envelope of a wave packet with a narrow frequency 
spectrum, and the phase velocity v ph = cdk' of the component waves. The second-order dispersion effect, 
proportional to d 2 dd 2 k', leads to the deformation (gradual broadening) of the envelope itself. Following tradition, 
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Let us consider a simple but very important model of dispersive media. 12 In dilute atomic or 
molecular systems (including gases), electrons respond to the external electric field especially strongly 
when frequency co is close to certain eigenfrequencies coj corresponding to the spectrum of quantum 
transitions of a single atom/molecule. An approximate, phenomenological description of this behavior 
may be obtained from a classical model of several externally-driven harmonic oscillators with finite 
damping. For an oscillator, driven by electric field's force F(i) = qE(t), we can write the 2 nd Newton law 
as 



m(x + 2Sx + a>ox) = qE(t) , (7.30) 



where coq is the own frequency of the oscillator, and 8 its damping coefficient. For a sinusoidal field, 
E(i) = Re [Eafixpl-icot}], we can look for a particular, forced-oscillation solution in a similar formx(0 = 
Re [xa£xp{-icot}]. 13 Plugging this solution into Eq. (30), we can readily find the complex amplitude of 
these oscillations: 

x» = — — 2 "2T • ( 7 - 31 ) 

m (co 0 -co )-2ico8 

Using this result to calculate the complex amplitude of the dipole moment as p m = qxa» and then the 
electric polarization P m = np a of a dilute medium with n independent oscillators for unit volume, for its 
frequency-dependent permittivity (27) we get 

e{a>) = e 0 +^— 2 1—, — . (7.32) 

m (co 0 -co )-2icoo 

This result may be readily generalized to the case when the system has several types of 
oscillators with different eigenfrequencies: 



£(m) in 
oscillator 
medium 



s(co) = s 0 +n—> — 2 2 

m j (cOj - co )-2icoSj 



(7.33) 



where fj = tij/ti is the fraction of oscillators with eigenfrequency co,, so that the sum of all f equals 1. 
Figure 5 shows a typical behavior of the real and imaginary parts of the complex dielectric constant, 
described by Eq. (33), as functions of frequency. The effect of oscillator resonances is clearly visible, 
and dominates the media response at co « coj, especially in the case of low damping, S, « coj. Note that in 
the low-damping limit, the imaginary part of the dielectric constant s", and hence the wave attenuation 
k", are negligibly small at all frequencies besides small vicinities of frequencies coj, where derivative 
ds'(co)/dco is negative. 14 Thus, for a system of for weakly-damped oscillators, Eq. (33) may be 
approximated, at most frequencies, as a sum of odd singularities ("poles"): 



these effects are discussed in more detail in the quantum-mechanics part of my lecture notes (QM Sec. 2.1), 
because they are the crucial factor of Schrodinger's wave mechanics. (See also CM Sec. 5.3.) 

12 This example is focused on the frequency dependence of s, because electromagnetic waves interact with 
"usual" media via their electric field much more than via the magnetic field. However, as will be discussed in Sec. 
7, forgetting about the possible dispersion of //(«) might result in missing some remarkable opportunities for 
manipulating the waves. 

13 If this point is not absolutely clear, please see CM Sec. 3.1. 

14 In optics, such behavior is called the anomalous dispersion. 
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s{co) ~ s 0 + n-^— V — , for 8. « \(0-(0. « \a>. -co r \. (7.34) 

2m j co j -co 1 1 1 1 

This result is especially important because, according to quantum mechanics, Eq. (34) is also 
valid for a set of non-interacting, similar quantum systems (whose dynamics may be completely 
different from that of a harmonic oscillator!), provided that coj are replaced with frequencies of possible 
quantum interstate transitions, and coefficients fj are replaced with the so-called oscillator strengths of 
the transitions - which obey the same sum rule, Y,jfj = l. 15 




Fig. 7.5. Typical frequency 
dependence of the real and 
imaginary parts of the electric 
permittivity of a media consisting of 
several classical dipole oscillators. 



At co — » 0, the imaginary part of the permittivity also vanishes (for any 8j), while its real part 
approaches its electrostatic ("dc") value 



£ (0) = s Q +q 2 Y J — 



j mjCOj 



(7.35) 



Note that according to Eq. (30), the denominator in Eq. (35) is just the effective spring constant Kj = 
mjCof of the oscillator, so that oscillator masses m, as such are, quite naturally, not involved in the static 
dielectric response. 

In the opposite limit co» coj, b), permittivity (33) also becomes real, and may be presented as 




, where co 1 =— V — 

s 0 j m. 



plasma 



The last result is very important, because it is also valid at all frequencies if all coj and 8j vanish, 
i.e. for a gas of free charged particles, in particular for plasmas - ionized atomic gases. (This is why the 
parameter co p defined by Eq. (36) is called the plasma frequency.) Typically, the plasma as a whole is 
neutral, i.e. the density n of positive atomic ions is equal to that of the free electrons. Since the ratio 



15 See, e.g., QM Chapters 5 and 9. 
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fijlrrij for electrons is much higher than that for ions, the general formula (36) for the plasma frequency is 
usually well approximated by the following simple expression: 



ne 



(7.37) 



ne I So 



This expression has a simple physical sense: the effective spring constant ic e f = m e co p 
describes the Coulomb force that appears when the electron subsystem of a plasma is shifted, as a 
whole, from its positive-ion subsystem, thus violating the electroneutrality. Indeed, consider such a 
small shift, Ax, perpendicular to the plane surface of a broad, plane slab filled with plasma. The 
uncompensated charges, with equal and opposite surface densities a = TenAx, that appear at the slab 
surfaces, create inside the it, according to Eq. (2.3), a uniform electric field E x = enAx/s 0 . This field 
exerts force eE = (ne 2 /so) Ax on each positively charged ion. According to the 3 rd Newton law, the ions 
pull each electron back to its equilibrium position with the equal and opposite force F = -eE = - (ne Isq) 
Ax, justifying the above expression for tc e f. Hence it is not surprising that e(co) described by the first of 
Eqs. (36) turns into zero at co = co p : at this resonance frequency, finite free oscillations of charge (and 
hence of D = sE) do not require a finite force (and hence E). 

The behavior of electromagnetic waves in a medium that obeys Eq. (36), is very remarkable. If 
the wave frequency co is above co p , the dielectric constant and hence the wave number (28) are positive 
and real, and waves propagate without attenuation, following the dispersion relation, 



Plasma 
dispersion 
relation 



k(co) = co[s(co)^ 0 f 2 = -(co 2 - co 2 J' 2 . 



(7.38) 



which is shown in Fig. 6. (As we will see later in this chapter, many wave transmission systems obey 
such dispersion law as well.) 



co 




k/(a>/c) 



Fig. 7.6. Plasma dispersion law (solid line) in 
comparison with the linear dispersion of the 
free space (dashed line). 



At co — » co p the wave number k tends to zero. Beyond that point (at co < co p ), we still can use Eq. 
(38), but it is more instrumental to rewrite it in the mathematically equivalent form 



k(co) = —\co 2 -co 2 ) 2 =—, where S = -r 



2 2 i" 2 

CO — CO 



,J/7 



(7.39) 



According to Eq. (29), this means that the electromagnetic field exponentially decreases with distance: 
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/ = Re/y^ = expj-^Re/^' 



cot 



(7.40) 



Does this mean that the wave is being absorbed in the plasma? Answering this question is a good 
pretext to calculate the time average of the Poynting vector S = ExH of a monochromatic 
electromagnetic wave in an arbitrary dispersive (but still linear!) medium. First, let us spell out fields' 
time dependence: 



E(t) = R Q [E a) (z)e- ia,t ] = ^[E m e- ia}t +c.c], H(t) = Re[^(zK 



■i cot 



Z(oo) 



-icot . „ „ 

e +c.c. 



(7.41) 



Now, a straightforward calculation yields 16 

EE 



S = E{t)H{t) = 



1 1 

+ ■ 



Z(oo) Z (co) 



E E 1 

— - — —Re 



Z{a>) 



Re 



f i ^^ 1/2 
ju(co) 



(7.42) 



Let us apply this important general formula to our simple model of plasma at oo < oo p . In this case 

1/2 

ju(oo) = juo, i.e. is positive and real, while s(oo) is real and negative, so that \IZ(oo) = [s(oo)/ ju(oo)] is 
purely imaginary, and the average Poynting vector (42) vanishes. This means that energy, on the 
average, does not flow along axis z, at it would if it was absorbed in plasma. As we will see in the next 
section, waves with co< co p are rather reflected from plasma's boundary, without energy loss. Note that 
in the limit oo« oo p , Eq. (39) yields 



( 2 A 


1/2 


f 




c s 0 m e 
















I ne 2 j 






J 



111 



(7.43) 



But this is just a particular case (for q = e and /u = juo) of the expression (6.38) that we have derived for 
the depth of magnetic field penetration into a lossless (collision-free) conductor in the quasistationary 
approximation. We see again that, as was already discussed in Sec. 6.7, that approximation (in which we 
neglect the displacement currents) gives an adequate description of the time-dependent phenomena at co 
« co p , i.e. at 8 «c/a>= \lk = Xlln. 

There are two most important examples of plasmas. For the Earth's ionosphere, i.e. the upper 
part of the atmosphere that is almost completely ionized by the UV and X-ray components of Sun's 
radiation, the maximum value of n, reached at about 300 km over the Earth surface, is between 10 10 and 

12 3 

10 m" (depending on the time of the day and Sun's activity), so that that the maximum plasma 
frequency (37) is between 1 and 10 MHz. This is much higher than the particle's reciprocal collision 
time r 1 , so that Eq. (36) gives a very good description of plasma's electric polarization. The effect of 
reflection of waves with a> < a> p from the ionosphere enables long-range (over-the-globe) radio 
communications and broadcasting at the so-called short waves, with frequencies of the order of 10 MHz. 



16 For an arbitrary plane wave the total average power flow may be calculated as an integral of Eq. (42) over all 
frequencies. By the way, combining this integral and the Poynting theorem (6.103), one can also prove the 
following interesting expression for the average electromagnetic energy density in an arbitrary dispersive (but 
linear and isotropic) medium: 

d(cos) 



doo 



* dico^ . 

CO -I CO Cl 

doo 



doo . 
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Such waves may propagate in the flat channel formed by the Earth surface and the ionosphere, reflected 
repeatedly by these "walls". Unfortunately, due to the random variations of Sun's activity, and hence co p , 
such natural communication channel is not too reliable, and in our age of fiber optics cables its practical 
importance is diminishing. 

Another important example of plasmas is free electrons in metals and other conductors. For a 

23 3 29 3 16 1 

typical metal, n is of the order of 10 cm" =10 m" , so that Eq. (37) yields co p ~ 10 s" . Note that this 
value of C0p is somewhat higher than mid-optical frequencies (co~ 3xl0 15 s" 1 ). This explains why planar, 
even, clean metallic surfaces, such as aluminum and silver films used in mirrors, are so shiny: at these 
frequencies the permittivity is almost exactly real and negative, leading to light reflection, with very 
little absorption. However, the considered model, which neglects electron scattering, becomes 
inadequate at lower frequencies, cot- I. 

A phenomenological way of extending the model by account of scattering is to take, in Eq. (33), 
the lowest eigenfrequency coj to be equal zero (to describe free electrons), while keeping the damping 
coefficient 5q of this mode finite, to account for their energy loss due to scattering. Then Eq. (33) is 
reduced to 

(*>) = *opt (*>) + — 2 ~ x = *opt (*>) + ~ ° : — — , (7.44) 

m -co -Iicooq co 2o 0 m 1 — ico 1 2o 0 

where response s ovt {co) at high (in practice, optical) frequencies is still given by Eq. (33), but now with j 

Result (44) allows for a simple interpretation. To show that, let us incorporate into our 
calculations the Ohmic conduction, generalizing Eq. (4.7) as = o(co)E a to account for the possible 
frequency dependence of the Ohmic conductivity. Plugging this relation into the Fourier image of the 
relevant Maxwell equation, VxH w = j w - icoD OJ = } m - icos^cdj&a, we get 

VxH. = [ff(ffl)-iffl6H]E.. (7.45) 

This relation shows that for a sinusoidal process, the addition of the Ohmic current density j&,to the 
displacement current density is equivalent to addition of o(co) to -icos(co), i.e. to the following change of 
the ac electric permittivity: 17 

s{co) -> £ et (co) = s opt {co) + i^- . (7.46) 

CO 

Now the comparison of Eqs. (44) and (46) shows that they coincide if we take 

Generalized n a 2 r ] 1 

□rude (j{C0) = = fj(0) , (7.47) 

formula m Q 1 - ICOT 1 - ICOT 

where the dc conductivity o(0) is described by the Drude formula (4.13), and the phenomenologically 
introduced coefficient So is associated with 1/2 r. Relation (47), which is frequently called the 



17 Alternatively, according to Eq. (45), it is possible (and in infrared spectroscopy, conventional) to attribute the 
ac response of a medium at all frequencies to effective complex conductivity <T ef (<y) = o(cu) - ia>s{(0) = -icos^cd). 
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generalized (or "ac", or "rf ) Drude formula, 1 * gives a very reasonable (semi-quantitative) description 
of the ac conductivity of many metals almost all the way up to optical frequencies. 



7.3. Kramers-Kronig relations 

The results for the simple model of dispersion, discussed in the last section, imply that the 
frequency dependences of the real (s') and imaginary (s") parts of the permittivity are not quite 
independent. For example, let us have one more look at the resonance peaks in Fig. 5. Each time the 
real part drops with frequency, ds'ldco< 0, its imaginary part s" has a positive peak. R. de L. Kronig in 
1926 and H. A. Kramers in 1927 independently showed that this is not an occasional coincidence 
pertinent only to the simple oscillator model. Moreover, the full knowledge of function e \co) allows one 
to calculate function s"{co), and vice versa. The reason is that both these functions are always related to 
a single real function G{6) by Eqs. (28). 

To derive the Kramers-Kronig relations, let us consider Eq. (27) on the complex frequency 
plane, at— > m' + ico": 



GO GO 

f(co) = s(co) -£ 0 =\ G{6)e ico9 de = J G{6)e i(0 ' d e' 03 " 9 d6. 



(7.48) 



For all stable physical systems, G{9) has to be finite for all important values of the integration variable 
{9 > 0), and tend to zero at > 0 and 0— » oo. Because of that, and thanks to factor e' a 9 , the expression 
under the integral tends to zero at \co\ — > oo in all upper half-plane (co" > 0). As a result, we may claim 
that the complex -variable function j{co) is analytical in that half-plane, and allows us to apply to it the 
Cauchy integral formula 19 



f(a>) =—lf(n)— 

2m i CI- co 



(7.49) 



with the integration contour of the form shown in Fig. 7, with radius R of the larger semicircle tending 
to infinity, and radius r that of the smaller semicircle (about the singular point Q = co) tending to zero. 



ImQ 



C 



Q = i? — > oo 



CO 



0 \CL-d = r^0 ReQ 



Fig. 7.7. Integration path C used in the 
Cauchy integral formula to derive the 
Kramers-Kronig dispersion relations. 



18 It may be also derived from the Boltzmann kinetic equation in the so-called relaxation- time approximation 
(RTA) - see, e.g., SM Sec. 6.2. 

19 See, e.g., MA Eq. (15.2). 
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Due to the exponential decay of \J(Cl)\ at iQl — > qo, the contribution to the integral from the 
larger semicircle vanishes, 20 while the contribution from the small semicircle, where Q = co + rexp{i(p}, 
with -n< cp < 0, is 



lim 



1 



dQ. f(co) }irexp{i(p}d(p f(co) 



r->0 



2m 



Cl=co+rexp{i<p) 



— f 
-1 J 



\d<p = ]-f{a>). (7.50) 



Q-co 2m i M rexp{7^} 2n 



As a result, for our contour C, Eq. (49) yields 

1 



f{a>) = lim,.. 



2m 



7 + J /( n)-^- + I /( , J) 



(7.51) 



Such an integral, excluding a symmetric infinitesimal vicinity of the pole singularity, is called the 
principal value of the (formally, diverging) integral from -qo to +oo, and is denoted by letter P before it. 21 
Using this notation, subtracting f[co)/2 from both parts of Eq. (48), and multiplying them by 2, we get 



1 +co 

/(<!>) = — Pf/(fl) 

m J 



dQ 
Q-co 



(7.52) 



Now plugging into this complex equality the polarization-related difference J[co) = s(cd) - Sq in 
the form [s'(co) - So] + i[s"(co)], and requiring both real and imaginary components of both parts of Eq. 
(52) to be equal separately, we get the famous Kramers -Kronig dispersion relations 



£ >(co) = s 0 +-V\s"(n)-^, s"(v) = —v\[s'(n)-s 0 ] 



(7.53) 



Kramers- 

Kronig Now we may use the already mentioned fact that s\co) is always an even, while s"(a>) an odd function 
d 'reiations °^ frequency, to rewrite these relations in the following form 



e'(co) = e 0 +^P + fV(n)-^ T , ^"(«) = -— P jW)-*ol 



da 



CD 



n 



Q.- 



co 



(7.54) 



which is more convenient for most applications, because it involves only physical (positive) frequencies. 

Though the Kramers-Kronig relations are "global" in frequency, in certain cases they allow an 
approximate calculation of dispersion from experimental data for absorption, collected even in a limited 
frequency range. For example, if a medium has a sharp absorption peak at some frequency coj, we may 
approximate it as 

s"{co) ~ c8{co -C0j) + a more smooth function of co , (7.55) 
and the first of Eqs. (54) immediately gives 



20 Strictly speaking, this also requires ]/(Q)| to decrease faster than Q" 1 at the real axis (at Q." = 0), but due to 
nonvanishing inertia of charged particles, this requirement is fulfilled for all realistic models of dispersion - see, 
e.g., Eq. (36). 

21 I am typesetting this symbol in a Roman font, to exclude any possibility of its confusion with media's 
polarization. 
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s'{co) ~ s 0 H — - — - + another smooth function of co , 

n co, - co 



thus predicting the anomalous dispersion near such a point. This calculation shows that such behavior 
observed in the classical oscillator model (Fig. 5) is by no means occasional or model-specific. 

Let me emphasize again that the general, and hence very powerful Kramers-Kronig relations 
hinge on the causal, linear relation (21) between polarization P(t) with the electric field E(t '), but not on 
much else. This is why such relations are also valid for similar causal relations in other fields of 
physics. 22 



Dispersion 

(7.56) neara ". 
v ' absorption 

line 



7.4. Reflection 

The most important new effect arising in nonuniform media is wave reflection. Let us start its 
discussion from the simplest case of a plane electromagnetic wave that is normally incident on an 
interface between two uniform, linear, isotropic media. 

If the interface is an ideal mirror, the description of reflection is very simple. Indeed, let us 
assume that one of the two media (say, located at z > 0, see Fig. 8) cannot sustain any electric field at 
all: 



z>0 



0. (7.57) 



This condition is evidently incompatible with the single traveling wave (5). However, this solution may 
be readily corrected using the fact that the dispersion-free ID wave equation, 

E = 0, (7.58) 



1 8' 



8z 2 v 2 dt 2 j 



supports waves, propagating, with the same speed, in opposite directions. As a result, the following 
linear superposition of two such waves, 



r<0 



f( z - vt )-f(-z-vt), (7.59) 



22 In this context, it is important to remember that a simply-looking relation between Fourier amplitudes of certain 
variables, such as D ffl = s(co)E m , still does not imply the causal relationship between them. This means that the 
Kramers-Kronig relations are not necessarily valid for either functions s(co) and /j(g>), or their reciprocals, of an 
arbitrary medium. Indeed, since any Green's function describing a causal relationship has to tend to zero at small 
times 9 = t— t' (because no system may responds to an external force instantly), its Fourier image has to tend to 
zero at a> — > ± oo. This is certainly true, for example, for function J[cd} = s(co) - Sq given by Eq. (32) describing a 
dilute electric medium, but not for its inverse 1/J[cd) gc {of - coq 2 ) - 2i5a>, which diverges at large frequencies. As 
another example, since in a dilute linear medium the magnetic response should be due to a causal relation between 
the average magnetic field B (cause) and magnetization M (effect), whose Fourier images are related as M f „ = 
j,„(<y)H f0 = [l///o - l//i((o)]B m , the Kramers-Kronig relations may be expected to be valid for function f\co) = l//Jo 
- l/ju(cu), but not for //(<y) or even [//(«) - /Jo]. Unfortunately, magnetic susceptibility dispersion studies were 
started just recently, mostly in the context of the negative refractivity effects - see Sec. 5 below, and I am not 
aware of any convincing discussion of this issue even in research literature (leave alone textbooks :-). 
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satisfies both the equation and the boundary condition (57), for an arbitrary function / The second term 
in Eq. (59) may be interpreted as the total reflection of the incident wave described by its first term, in 
this case with the change of electric field's sign. By the way, since vector n of the reflected wave is 
opposite to that incident one (see arrows in Fig. 1), Eq. (6) shows that the magnetic field of the wave 
does not change its sign at the reflection: 

#U =Uf(z-vt) + f(-z-vt)]. (7.60) 




Blue lines in Fig. 8 show the resulting pattern (59) for the simplest, sinusoidal waveform 



Wave's 
total 
reflection 



E 


z<o = R e 


P i(kz—cot) r, i(—kz—cot) 

pco e ~ tj <a e 





(7.61a) 



Depending on convenience in a particular context, this pattern may be legitimately interpreted either as 
a superposition (61a) of two traveling waves or a single standing wave, 

E\ z£0 = -2 lm(E a e~ i0}t )sin kz = 2 Re(iE m e~ ia>t )sin kz , (7.61b) 

in which the electric and magnetic field oscillate with the phase shifts by nil both in time and space: 



Ejo_ e i(kz-cot) E m ^(-kz-cot) 

z z 



= 2Re 



-i cot 



V 



cos kz . 



As the result of this shift, the time average of the Poynting vector's magnitude, 

S(z, t) = EH = - RQ[Ey 2iat ] sin 2kz , 

z 



(7.62) 



(7.63) 



equals zero, showing that at the total reflection there is no average power flow. (This is natural, because 
the perfect mirror can neither transmit the wave nor absorb it.) However, Eq. (63) shows that the 
standing wave provides local oscillations of energy, transferring it periodically between the 
concentrations of the electric and magnetic fields, separated by distance Az = nllk = XI A. 

For the case of the sinusoidal waves, the reflection effects may be readily explored even for the 
more general case of dispersive and/or lossy media in which s(co) and ju(co), and hence the wave vector 
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k(co) and wave impedance Z(co), defined by Eqs. (28), are certain complex functions of frequency. The 
"only" new factors we have to account for is that in this case the reflection may not be full, and that 
inside the second media we have to use the traveling-wave solution as well. Both these factors may be 
taken care of by looking for the solution of our boundary problem in the form 



E 


z<0 = Re 


E w (e ik ~ z +Re- ik - z )e- im 


, E 


z>o = R e 


EJe ik+z e- im 


5 



and hence, according to Eq. (6), 



#Uo =Re 



Z {(D) 



ik-z 



Re~ ik - Z )e- i0}t 



#Uo = Re 



Z + (co) 



-Te ik+Z e~ icot 



(7.64) 



(7.65) 



Wave's 

partial 

reflection 



(Indices + and - correspond to, respectively, the media at z > 0 and z < 0.) Please note the following 
important features of these relations: 

(i) Due to the problem linearity, we could (and did :-) take the complex amplitudes of the 
reflected and transmitted wave proportional to that (E^ of the incident wave, describing them by the 
dimensionless coefficients R and T. The total reflection from an ideal mirror, that was discussed above, 
corresponds to the particular case R = -1 and T=0. 

(ii) Since the incident wave, that we are considering, arrives from one side only (from z = - oo), 
there is no need to include a term proportional to exp{-/A:+z} into Eqs. (64)-(65) - in our current problem. 
However, we would need such a term if the medium at z > 0 was non-uniform (e.g., had at least one 
more interface or any other inhomogeneity), because the wave reflected from that additional 
inhomogeneity would be incident on our interface (located at z = 0) from the right. 

(iii) Solution (64)-(65) is sufficient even for the description of the cases when waves cannot 
propagate at z > 0, for example a conductor or a plasma with co p > co. Indeed, the exponential drop of the 
field amplitude at z > 0 in such cases is automatically described by the imaginary part of wave number 
k+ - see Eq. (29). 

In order to find coefficients R and T, we need to use boundary conditions at z = 0. Since the 
reflection does not change the transverse character of the partial waves, at the normal incidence both 
vectors E and H remain tangential to the interface plane (in our notation, z = 0). Reviewing the 
arguments that has led us, in statics, to boundary conditions (3.47) and (5.118) for these components, we 
see that they remain valid for the time-dependent situation as well, 23 so that for our current case of 
purely transverse waves we can write: 



=-0 



H\ 



=-o 



H\ 



=+o 



Plugging Eqs. (64)-(65) into these conditions, we get 



1 + R = T, 



(7.66) 



(7.67) 



23 For example, the first of conditions (66) may be obtained by integrating the full (time-dependent) Maxwell 
equation VxE + dB/dt = 0 over a narrow and long rectangular contour with dimensions / and d (d « /) stretched 
along the interface. In the Stokes theorem, the first term gives AEJ, which the contribution of the second term is 
proportional to product dl and vanishes as dll — > 0. The proof of the second boundary condition is similar - as was 
already discussed in Sec. 6.2. 
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Reflection 
and 

transmission 
at a sharp 
interface 



Solving this simple system of equations, we get 24 




(7.68) 



These formulas are very important, and much more general than one may think, because they are 
applicable for virtually any ID waves - electromagnetic or not, if only the impedance Z is defined in a 
proper way. 25 Since in the general case the wave impedances Z+, defined by Eq. (28) with the 
corresponding indices, are complex functions of frequency, Eqs. (68) show that coefficients R and T 
may have imaginary parts as well. This fact has most important consequences at z < 0 where the 
reflected wave, proportional to R, interferes with the incident wave. Indeed, plugging R = I R I e 1<p 
(where q> = arg R is a real phase shift) into the expression in parentheses in the first of Eqs. (64), we 
may rewrite it as 



I ik—z , n -ik—z | U I nl , I n\\ ik—z , i n i ia> -ik—z 
\e + Re 1= (1 - 1«| + \R\\e +\R\e^e 



= (l - \R\)e ik - z + 2\R\e icpl2 sm[k_ (z -8_)\ where 8_ = 



<p-n 
2k 



(7.69) 



This means that the field may be presented as a sum of a traveling wave and a standing wave, with 
amplitude proportional to I R \ , shifted by distance 8. toward the interface, relatively to the ideal-mirror 
pattern (61b). This effect is frequently used for the experimental measurements of an unknown 
impedance Z+ of some medium, provided than Z_ is known (e.g., for the free space, Z. = Zo). For that, a 
small antenna (the probe), not disturbing the field distribution too much, is placed into the wave field, 
and the amplitude of the ac voltage induced in it by the wave in the probe is measured by some detector 
(e.g., a semiconductor diode with a quadratic I-V curve), as a function of z (Fig. 9). From this 
measurement, it is straightforward to find both I R I and 8., and hence restore complex R, and then use 
Eq. (68) to calculate both modulus and argument of Z+ 26 




V ozE 2 (z,t) 



Fig. 7.9. Measurement of the complex 
impedance of a medium (schematically). 



Now let us discuss what do these results give for waves incident from the free space (Z.(ffl) 
const, k. = ko= cole) onto the surface of two particular media. 



24 Please note that only the media impedances (rather than wave velocities) are important for the reflection in this 
case! Unfortunately, this fact is not clearly emphasized in some textbooks that discuss only the case //+ = jUq, when 
Z = (p 0 /£) 1/2 and v = l/(//o£) 1/2 are proportional to each other. 

25 See, e.g., the discussion of elastic waves of mechanical deformations in CM Sees. 5.3, 5.4, 7.7, and 7.8. 

26 Before the advent of computers, specially lined paper (called the Smith chart) was commercially available for 
performing this recalculation graphically; it is occasionally used even nowadays for result presentation. 
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(i) For a collision- free plasma (with negligible magnetization) we may use Eq. (36) with ju(co) = 
juo, to present the impedance in either of two equivalent forms: 



Z +~ Z Q( , 2 \l/2 ~ iZ 0 ( 2 2 ^/2- (7.70) 

[co -co p ) [co p -co ) 



The former expression is more convenient in the case co > co p , when the wave vector k+ and the wave 
impedance Z+ of plasma are real, so that a part of the incident wave propagates into the plasma. 
Plugging this expression into the latter of Eqs. (68), we see that the transmission coefficient is real: 

CO + [CO -co ) 

Note that according to this formula, somewhat counter-intuitively, T > 1 for any frequency 
(above co p ). How can the transmitted wave be more intensive than the incident one that has induced it? 
For a better understanding of this result, let us compare the powers (rather than amplitudes) of these two 
waves, i.e. their average Poynting vectors (42): 

\E\ 2 — |ri< I 2 \E\ 2 4co(co 2 -co 2 f 2 

^incident — ~~Z~Z ' ^+ ~ ~ TZ ~ ~ — T ; x ,,„l 2 • 



2Z o 2Z + 2Z o U + (« 2 -« 2 



It is easy to see that the ratio of these two values 27 is always below 1 (and tends to zero at co — > co p ), so 
that only a fraction of the incident wave power may be transferred. Hence the result T > 1 may be 
interpreted as follows: the interface between two media also works as an impedance transformer, though 
it can never transfer more power than the incident wave provides, i.e. can only decrease the product S = 
EH, but since the ratio Z = E/H changes at the interface, the amplitude of one of the fields may increase 
at the transfer. 

Now let us proceed to case co< C0p, when the waves cannot propagate in the plasma. In this case, 
the latter of expressions (70) is more convenient, because it immediately shows that Z+ is purely 
imaginary, while Z. = Zq is purely real. This means that (Z+ - Z.) = (Z+ + Z.)*, i.e. according to the first 
of Eqs. (68), \R\ = 1, so that the reflection is total, i.e. no incident power (on the average) is transferred 
into the plasma - as was already discussed in Sec. 2. However, the complex R has a finite argument, 

CP = arg R = 2arg(Z + -Z 0 ) = -2 arctan - , (7.73) 

[col-co 2 ) 

and hence provides a finite spatial shift (69) of the standing wave toward the plasma surface: 

5 = £ZZU Cretan , <° — . (7.74) 

2k 0 co [col-co 1 )' 1 

On the other hand, we already know from Eq. (40) that the solution at z > 0 is exponential, with 
the decay length 8 that is described by Eq. (39). Calculating, from coefficient T, the exact coefficient 
before this exponent, it is straightforward to verify that the electric and magnetic fields are indeed 



27 This ratio is sometimes also called the transmission coefficient, but in order to avoid its confusion with T, it is 
better to call it the power transmission coefficient. 
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continuous at the interface, forming the pattern shown by red lines in Fig. 8. This penetration may be 
experimentally observed, for example, by bringing close to the interface the surface of another material 
transparent as frequency co. Even without solving this problem exactly, it is evident that if the distance 
between these two interfaces becomes comparable to S, a part of the exponential "tail" of the field is 
picked up by the second material, and induces a propagating wave. This is an electromagnetic analog of 
the quantum-mechanical tunneling through a potential barrier. 28 



Note that at co « co p , both 5- and 8 are reduced to the same frequency-independent value, 



S,S_ 



c 

co„ 



( 2 

c s 0 m e 



1/2 



,1/2 



ne 



J 



m 



(7.75) 



which is just the field penetration depth £(6.38) calculated for a perfect conductor model (assuming m = 
m e and ju = juo) in the quasistationary limit. This is natural, because the condition co « co p may be recast 

as /to = 2ne/ co » 2ml co p = 2n8. 

(ii) Now let us consider electromagnetic wave reflection from a dissipative conductor. In the 
simplest low-frequency limit, when both cor and colcOj are much less than 1, the conductor may be 
described by a frequency-independent conductivity cr . According to Eq. (47), in this case we can take 29 



.1/2 



2 + = 



Mo 



e(0) + ia/ co 



(7.76) 



With this substitution, Eqs. (68) immediately give us all the results of interest. In particular, they show 
that now R is complex, and hence some fraction F of the incident wave is absorbed by the conductor. 
Using Eq. (42), we may calculate the fraction to be 



F 



incident 



(7.77) 



(Since power flow S+ into the conductor depends on z, tending to zero at distances z ~ S, it is important 
to calculate it directly at the interface to account for the absorption in the whole volume of the 
conductor.) Restricting ourselves, for the sake of simplicity, to the most important quasistationary limit, 

1/2 

i.e. to Z+ = (juoco/ia) , and using Eq. (6.26) to express the impedance via the skin depth, Z+ = 
(S s /A)Zq, we see that |Z+ I « Z 0 , so that, according to Eq. (68), T« 2Z+/Z 0 and 



to Z+ 
<2//) 1/2 



Wave's 
absorption 
in 

conductor's 
surface 




(7.78) 



Thus the absorbed power scales as the ratio of the skin depth to the free-space wavelength. This 
important result is widely used for the semi-qualitative evaluation of power losses in metallic 
waveguides and resonators, and immediately shows that in order to keep the losses low, the 
characteristic size of such systems (that gives a scale of the free-space wavelengths An, at which they are 



28 See, e.g., QM Sec. 2.3. 

29 For a typical metal with r~ 10" 13 s, Eq. (73) is valid all the way up to co ~ 10 13 s" , i.e. up to the far-infrared 
frequencies. 
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used) should be much larger than S s . A more detailed theory of these structures will be discussed later in 
this chapter. 



7.5. Refraction 

Now let us consider the effects arising at the plane interface if the wave incidence angle 0 (Fig. 
10) is arbitrary, rather than equal to zero as in our previous analysis, for the simplest case of fully 
transparent media, with real s± and ju+. 



llili 


\ 

i 

, sin A 

r * > 


S_,jU_ 


k sin 6/ 
/ 6 


X^sin^l 
" 9 \ 



Fig. 7.10. Plane wave reflection, transmission, and 
refraction at a plane interface. Fhe plane of drawing is 
selected to contain all three wave vectors k+, k., and k '.. 



In contrast with the case of normal incidence, here the wave vectors k_, k. ', and k+ of the three 
component (incident, reflected, and transmitted) waves may have different directions. Hence now we 
have to start our analysis with writing a general expression for a single plane, monochromatic wave for 
the case when its wave vector k has all 3 Cartesian components, rather than one. An evident 
generalization of Eq. (1 1) to this case is 



/(r,0 = Re 



fa? 



i[k x x ■ 



z z)-cot) 



Re 



/(kr-of) 



(7.79) 



This relation enables a ready analysis of "kinematic" relations that are independent of the media 
impedances. Indeed, it is sufficient to notice that in order to satisfy any linear, homogeneous boundary 
conditions at the interface (z = 0), all waves have the same temporal and spatial dependence on this 
plane. Hence if we select plane xz so that vector k. lies in it, then (k.) v = 0, and k+ and k. ' cannot have 
any v-component either, i.e. all three vectors lie in the same plane - that is selected as the plane of 
drawing of Fig. 10. Moreover, due to the same reason their x-components should be equal: 



k_ sin 6 = k_ sin 6 ' = k + sin r . 
From here we immediately have the well-known laws of reflection 



G' = G. 



(7.80) 



(7.81) 



Reflection 
angle 



and refraction: 30 



30 This relation is traditionally called the Snell law, after a 17' century's author W. Snellius, though it has been 
traced back to a circa 984 manuscript by Abu Saad al-Ala ibn Sahl. 
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Snell 


sinr 


k_ 


law 


sin6> 





Critical 
angle 



(7.82) 



In this form, the laws are valid for plane waves of any nature. In optics, the Snell law (82) is frequently 
presented in the form 



sinr 
sin^ 



n 



(7.83) 



where n+ is the index of refraction (also called the "refractive index") of the corresponding medium, 
defined as its wave number normalized so that of the free space (at wave's frequency): 



Index 
of refraction 



f V /2 
\ £ oMo j 



(7.84) 



Perhaps the most famous corollary of the Snell law is that if a wave propagates from a medium 
with a higher index of refraction to that with a lower one (i.e. if n. > n+ in Fig. 10), for example from 
water into air, there is always a certain critical value 0 C of the incidence angle, 



,1/2 



0. 



arc sin - 



= arcsin 



£ jU 



(7.85) 



at which angle r reaches nil. At a larger 0, i.e. within the range 0 C < 0 < nil, the boundary conditions 
cannot be satisfied with a refracted wave with a real wave vector, so that the wave experiences the so- 
called total internal reflection. This effect is very important for practice, because it shows that dielectric 
surfaces may be used as mirrors, in particular in optical fibers - to be discussed in more detail in Sec. 8 
below. This is very fortunate for all the telecommunication technology, because the light reflection from 
metals is rather imperfect. Indeed, according to Eq. (78), in the optical range (/lo ~ 0.5 |um, i.e. co~ 10 15 

1 8 

s" ), even the best conductors (with <j~ 6x10 S/m and hence the normal skin depth S s ~ 1.5 nm) provide 
relatively high losses F ~ 1% at each reflection. 

Note, however, that even within the range 0 C < 6< nil the field at z > 0 is not identically equal to 
zero: just as it does at the normal incidence (0= 0), it penetrates into the less dense media by a distance 
of the order of Aq, exponentially decaying inside it. At 0 ^ 0 the penetrating field still changes 
sinusoidally, with wave number (80), along the interface. Such a field, exponentially dropping in one 
direction but still propagating as a wave in another direction, is frequently called the evanescent wave. 

One more remark: just as at the normal incidence, the field penetration into another medium 
causes a phase shift of the reflected wave - see, e.g., Eq. (69) and its discussion. A new feature of this 
phase shift, arising at 0^0, is that it also has a component parallel to the interface - the so-called called 
the Goos-Hanchen effect. In geometric optics, this effect leads to an image shift (relative to that its 
position in a perfect mirror) with components both normal and parallel to the interface. 

Now let us carry out an analysis of the "dynamic" relations that determine amplitudes of the 
refracted and reflected waves. For this we need to write explicitly the boundary conditions at the 
interface (i.e. plane z = 0). Since now the electric and/or magnetic fields may have components normal 
to the plane, in addition to the continuity of their tangential components, which we have repeatedly 
discussed, 
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E Xy y | z=-0 ~ Ex,y | z=+0 > | z=-0 ~ ^x,y | z=+0 > (7.86) 

we also need relations for the normal components. As it follows from the homogeneous macroscopic 
Maxwell equations (6.94b), they are also the same as in statics (D n = const, B n = const), for our 
reference frame choice (Fig. 10) giving 

s E z | z= _ 0 = s + E z | z=+0 , n_H z | z= _ 0 = nJI z | z=+0 . (7.87) 

The expressions of these components via amplitudes RE^ and TE m of the incident, reflected 
and transmitted waves depend on the incident wave's polarization. For example, for a linearly -polarized 
wave with the electric field vector perpendicular to the plane of incidence (Fig. 11a), i.e. parallel to the 
interface plane, the reflected and refracted waves are similarly polarized. 




As a result, all E z are equal to zero (so that the first of Eqs. (87) is inconsequential), while the 
tangential components of the electric field are just equal to their full amplitudes, just as at the normal 
incidence, so we still can use Eqs. (64) to express these components via coefficients R and T. However, 
at 6 * 0 the magnetic fields have not only tangential components 



H, 



z=-0 



= Re 



^(\-R)cos0e- i<at 
Z 



z=+0 



= Re 



^LTcosr e- i6}t 



(7.88) 



but also normal components (Fig. 11a): 

E 



H. 



z=-0 



= Re 



(1 + Z?)sin0 e 



-icot 



H, 



z=+0 



= Re 



r—icot 
smr e 



(7.89) 



Plugging these expressions into the boundary conditions expressed by Eqs. (86) (in this case, for 
y components only) and the second of Eqs. (87), we get three equations for two unknown coefficients R 
and T. However, two of these equations duplicate each other because of the Snell law, and we get just 
two independent equations, 
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l + R = T, — (l-i?)cos6> = — Tcosr, 

z z . 



(7.90) 



which are a very natural generalization of Eqs. (67), with replacements Z_ — > Zxosr, Z+ — > Z+cos0. As a 
result, we can immediately use Eq. (68) to write the solution of system (90): 



R = 



Z + COS0-Z COST 

Z, cos0 + Z cosr 



T = 



2Z^ cos0 



Z, cos# + Z cosr 



(7.91a) 



If we want to express the coefficients via the angle of incidence alone, we should use the Snell 
law (82) to eliminate angle r, getting 



R = 



Z + cos 0-Z 


l-(k_/k + ) 2 sin 2 0 


1/2 


Z + cos 0 + Z_ 


l-(k_/k + ) 2 sm 2 0 


1/2 



T = 



2Z^ cosO 



Z + cos0 + Z_[\-(k_/k + ) 2 sin 2 0]' 2 



(7.91b) 



However, my strong preference is to use the kinematic relation (82) and dynamic relations (91a) 
separately, because Eq. (91b) obscures the very important physical fact that and the ratio of k+ , i.e. of 
the wave velocities of the two media, is only involved in the Snell law (79), while the dynamic relations 
essentially include only the ratio of wave impedances - just as in the case of normal incidence. 

In the opposite case of the linear polarization of the electric field within the plane of incidence 
(Fig. 1 lb), it is the magnetic field that does not have a normal component, so it is now the second of 
Eqs. (87) that does not participate in the solution. However, now the electric fields in two media have 
not only tangential components, 



z=-0 



= Re 



E a (l + R)cos0 e 



-icot 



z=+0 



= Re 



ET cos r e 



-imt 



but also normal components (Fig. 1 lb): 



' z z=-0 



= E w (r\ + R)sm.e, E z \ z=+0 = -EJsinr 



As a result, instead of Eqs. (90), the reflection and transmission coefficients are related as 

(l + i?)cos# = rcosr, —(l-R) = —T. 

z z + 

Again, the solution of this system may be immediately written using the analogy with Eq. (67): 

Z^cosr-Z cos0 



R = 



Z^cosr + Z cosO 



T = 



2Z + cos0 
Z + cosr + Z cos0 



(7.92) 



(7.93) 



(7.94) 



(7.95a) 



or, alternatively, using the Snell law: 



\-{h_lk + ) 2 sm 2 0 



1/2 



l-(£_/£ + ) 2 sin 2 0 



Z cos0 
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2Z + cos0 



+ Z_cos#' Z + [l-(£_/£ + ) 2 sin 2 6>] 1/2 +Z_cos6' 



(7.95b) 
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For the particular case ju+= ju. = /Jo, when Z+/Z- = (s./e+) = kJk+ = n./n+ (which is approximately 
correct for traditional optical media), Eqs. (91b) and (95b) are called the Fresnel formulas? 1 Most 
textbooks are quick to point out that there is a major difference between these cases: while for the 
electric field polarization within the plane of incidence (Fig. lib), the reflected wave amplitude 
(proportional to coefficient R) turns to zero at a special value of 6 (the so-called Brewster angle): 32 

0 B =arctan-^, (7.96) 

n_ 

while there is no such angle in the opposite case (Fig. 11a). 33 However, that this statement, as well as 
Eq. (96), is true only for the case /u+ = /u.. In the general case of different s and /u, Eqs. (91) and (95) 
show that the reflected wave vanishes at 9= 6q with 

t 2q £_ju + -£ + ju_ KjuJM-I forEln z (Fig. 11a), 

tan 6 B = 1 1 — x < , , 

s + ju + -s_ju_ [{-£+ I £_), for H _L n z (Fig. lib). 

Note the natural s <-» ju symmetry of these relations, resulting from the E <-> H symmetry for 
these two polarization cases (Fig. 11). They also show that for any set of parameters of the two media 
(with s+, ju±> 0), tan 2 6fe is positive (and hence a real Brewster angle 6q exists) only for one of these two 
polarizations. In particular, if the interface is due to the change of ju alone (i.e. s+ = £.), the first of Eqs. 
(97) is reduced to the simple form (96) again, while for the polarization shown in Fig. 1 lb there is no 
Brewster angle, i.e. the reflected wave has a nonvanishing amplitude for any 0. 

Such account of both media parameters on an equal footing is especially necessary to describe 
the so-called negative refraction effects. 34 As was shown in Sec. 2, in a medium with electric-field- 
driven resonances, function £(co) may be almost real and negative, at least within limited frequency 
intervals - see, in particular, Eq. (34) and Fig. 5. As have already been discussed, if, at these 
frequencies, function /u{co) is real and positive, then k {(d) = 0) 2 £((d)ju((d) < 0, and k may be presented as 
il 8 with real S, meaning the exponential field decay into the medium. However, let consider the case 
when both s(co) < 0 and ju(co) < 0 at a certain frequency. (This is evidently possible in a medium with 
both E-driven and H-driven resonances, at proper relations between their eigenfrequencies.) Since in 
this case k {cd) = o) 2 £{(d)fj,(cQ) > 0, the wave vector is real, so that Eq. (79) describes a traveling wave, 
and one could think that there is nothing new in this case. Not quite so! 



,„ Brewster 
\'"') angle 



31 After A. -J. Fresnel (1788-1827), one of the pioneers of the wave optics, who is credited, among many other 
contributions (see in particular Ch. 8), for the concept of light as a purely transverse wave. 

32 A very simple interpretation of Eq. (93) is based on the fact that, together with the Snell law (82), it gives r + 9 
= 7r/2. As a result, vector E+ is parallel to vector k. ', and hence oscillating dipoles of medium at z > 0 do not have 
the component which could induce the transverse electric field E. ' of the reflected wave. 

33 This effect is used in practice to obtain linearly polarized light, with the electric field vector perpendicular to 
the plane of incidence, from the natural light with its random polarization. An even more practical application of 
the effect is a partial reduction of undesirable glare from wet surfaces (for the water/air interface, n+ln. « 1.33, 
giving 9b « 50°) by making car light covers and sunglasses of vertically-polarizing materials. 

34 Despite some important background theoretical work by A. Schuster (1904), L. Mandelstam (1945), D. 
Sivikhin (1957), and especially V. Veselago (1966-67), the negative refractivity effects have only recently 
become a subject of intensive scientific research and engineering development. 
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First of all, for a sinusoidal, plane wave (79), operator V is equivalent to the multiplication by zk. 
As the Maxwell equations (2a) show, this means that at a fixed direction of vectors E and k, the 
simultaneous reversal of signs of s and ju means the reversal of the direction of vector H. Namely, if 
both s and ju are positive, these equations are satisfied with mutually orthogonal vectors E, H, and k 
forming the usual, right-hand system (see Fig. 1 and Fig. 12a), the name stemming from the popular 
"right-hand rule" used to determine the vector product direction. However, if both s and /u are negative, 
the vectors form a left-hand system - see Fig. 12b. (Due to this fact, the media with s < 0 and ju < 0 are 
frequently called the left-handed materials, LHM for short.) According to Eq. (6.97), that does not 
involve media parameters, this means that for a plane wave in a left-hand material, the Poynting vector S 
= ExH, i.e. of the energy flow, is directed opposite to the wave vector k. 




Fig. 7.12. Directions of main vectors 
of a plane wave inside a medium 
with (a) positive and (b) negative s 
and jU. 



This fact may seems strange, but is in no contradiction with any fundamental principle. Let me 
remind you that, according to the definition of vector k, its direction shows the direction of the phase 
velocity v p h = calk of a sinusoidal (and hence infinitely long) wave that cannot be used, for example, for 
signaling. Such signaling (by sending wave packets - see Fig. 13) is possible with the group velocity v gr 
= dcoldk. This velocity in left-hand materials is always positive (directed along vector S). 



-> Fig. 7.13. Example of a wave packet 
z moving along axis z with a negative phase 
velocity, but positive group velocity. Blue 
lines show a packet snapshot a short time 
interval after the first snapshot (red lines). 



Maybe the most fascinating effect possible with left-hand materials is the wave refraction at their 
interfaces with the usual, right-handed materials - first predicted by V. Veselago. Consider the example 
shown in Fig. 14a. In the incident wave, coming from the usual material, the directions of vectors k. and 
S. coincide, and so they are in the reflected wave characterized by vectors k and S '.. This means that 
the electric and magnetic fields in the interface plane (z = 0) are, at our choice of coordinates, 
proportional to exp{ik x x}, with positive component k x = k.cos 6. In order to satisfy any linear boundary 
conditions, the refracted wave, going into the left-handed material, should match that dependence, i.e. 
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have a positive x-component of its wave vector k+. But in this medium, this vector has to be antiparallel 
to vector S that, in turn, should be directed out of the interface, because it presents the power flow from 
the interface into the material bulk. These conditions cannot be reconciled by the refracted wave 
propagating along the usual Snell-law direction (shown by the dashed line in Fig. 13a), but are all 
satisfied at refraction in the direction given by Snell's angle with negative sign. (Hence the term 
"negative refraction"). 35 




Fig. 7.14. Negative refraction: (a) waves at the interface between media with positive and negative values 
of s/Li, and (b) the hypothetical perfect lense: a parallel plate made of a material with s = - £q and /li = - jU 0 . 



In order to understand how unusual the results of the negative refraction may be, let us consider 
a parallel slab of thickness d, made of a hypothetical left-handed material with a = - So, /u = - /Jo (Fig. 
14b), placed in free space. For such a material, the refraction angle r = - 6, so that the rays from a point 
source, located at a distance a < d from the slab, propagate as shown in that figure, i.e. all meet again at 
distance a inside the plate, and then continue to propagate to the second surface of the slab. Repeating 
our discussion for this surface, we see that a point's image is also formed beyond the plate at distance 2a 
+ 2b = 2a + 2(d - a) = 2d from the object. Superficially, this looks like the usual lense, but the well- 
known lense formula, which relates a and b with the focal length f, is not satisfied. (In particular, a 
parallel beam is not focused into a point at any finite distance.) 

As an additional difference from the usual lense, the system shown in Fig. 14b does not reflect 
any part of the incident light. Indeed, it is straightforward to check that in order for all above formulas 
for R and T to be valid, the sign of the wave impedance Z in left-handed materials has to be kept 
positive. Thus, for our particular choice of parameters (s = - So, ju = - jUo), Eqs. (91a) and (95a) are valid 
with Z+ = Z. = Z 0 and cos r = cos 0 = 1 , giving R = 0 for any linear polarization, and hence for any other 
wave polarization - circular, elliptic, natural, etc. 

The perfect lense suggestion has triggered a wave of efforts to implement left-hand materials 
experimentally. (Attempts to found such materials in nature have failed so far.) Most progress in this 
direction has been achieved using the so-called metamaterials, which are essentially quasi-periodic 
arrays of specially designed electromagnetic resonators, ideally with high density n » X . For example, 



35 Inspired by this fact, in some publications the left-hand materials are prescribed a negative index of refraction 
n. However, this prescription should be treated with care (for example, it complies with the first form of Eq. (84), 
but not its second form), and the sign of n, in contrast to that of wave vector k, is the matter of convention. 
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Fig. 15a shows the metamaterial that was used for the first demonstration of negative refractivity in the 
microwave region, i.e. a few-GHz frequencies - see Fig. 15b. It combines straight strips of a metallic 
film, working as lumped resonators with a large electric dipole moment (hence strongly coupled to 
wave's electric field E), and several almost-closed film loops (so-called split rings), working as lumped 
resonators with large magnetic dipole moments, coupled to field H. By designing the resonance 
frequencies close to each other, the negative refractivity may be achieved - see the black line in Fig. 
1 5b, which shows experimental data. Recently, the negative refractivity was demonstrated in the optical 
range, albeit at relatively large absorption that spoils all potentially useful features of the left-handed 
materials. 
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Fig. 7.15. The first artificial 
left-hand material with 
experimentally demonstrated 
negative refraction in a 
microwave region. Adapted 
from R. Shelby et ah, Science 
292, 77 (2001). © AAAS. 
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This progress has stimulated the development of other potential uses of metamaterials (not 
necessarily the left-handed ones), in particular designs of nonuniform systems with engineered 
distributions sir, co) and pAr, oi), which may provide electromagnetic wave propagation along the 
desired paths, e.g. around a certain region of space (Fig. 16), making it virtually invisible for an external 
observer - so far, within a limited frequency range, and a certain wave polarization only. Due to these 
restrictions, the practical value of this work on such invisibility cloaks in not yet clear (at least to this 
author); but so much attention is focused on this issue 36 that the situation should become much more 
clear in just a few years. 





Fig. 7.16. Experimental demonstration of a 
prototype 2D "invisibility cloak" in the 
microwave region. Adapted from D. Schurig 
et al, Science 314, 977 (2006). © AAAS. 



36 For a recent review, see, e.g., B. Wood, Comptes Rendus Physique 10, 379 (2009). 
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7.6. Transmission lines: TEM waves 



So far, we have analyzed plane the electromagnetic waves with infinite cross-section. The cross- 
section may be limited, still sustaining wave propagation, using wave transmission lines (also called 
waveguides): cylindrically-shaped structures made of either good conductors or dielectrics. Let us first 
discuss the first option. In order to keep our analysis (relatively :-) simple, let us assume that: 

(i) the structure is a cylinder (not necessarily with a round cross-section, see Fig. 17) filled with a 
usual (right-handed), uniform dielectric material with negligible losses: s = s' > 0, /u = ju' > 0, and 

(ii) the wave attenuation due to the skin effect is also negligibly low. (As Eq. (78) indicates, for 
that the characteristic size a of waveguide's cross-section has to be much larger than the skin-depth S s of 
its wall material. The effect of skin-effect losses will be analyzed in Sec. 10 below.) 

After such exclusion of attenuation, we may look for a particular solution of the Maxwell 
equations in the form of a monochromatic wave traveling along the waveguide: 



E(r,0 = Re 



E w (x,v)e 



i(k z z-cot) 



H(r,0 = Re 



H ffl (x,v)e 



i(k z z-a>t) 



(7.98) 



with real k z . Note that this form allows an account for a substantial coordinate dependence of the electric 
and magnetic field in the plane {x,y} of the waveguide's cross-section, as well as for longitudinal 
components of the fields, so that solution (98) is substantially more complex than the plane waves we 
have discussed above. We will see in a minute that as a result of this dependence, constant k z may be 



1 /7 

very much different from the plane -wave value (13), k = oj{s/J) , in the same material 




Fig. 7.17. Decomposition of the 
electric field in a waveguide. 



In order to describe these effects explicitly, let us decompose the complex amplitudes of the 
fields into the longitudinal and transverse components (Fig. 17) 37 



E„=E z n z +E t , 



H„ = Hn, +H, 



(7.99) 



Plugging Eqs. (98)-(99) into the homogeneous Maxwell equations (2), and requiring the longitudinal 
and transverse components to be balanced separately, we get 



37 Note that for the notation simplicity, I am dropping index a> in the complex amplitudes of the field components, 
and later will drop argument a> in k z and Z, though they may depend on the wave frequency rather substantially - 
see below. 
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ik z ii z xE ( - io)/M l = -V, x (E z n z ), ik z ii z x H, + icosE, = -V, x [H z n z ), 
V t xE t =ia){tH z n 2 , V, x H ( = -iscoE z n z , (7.100) 

V, =-ik z E z , V, H, =-ik z H z . 

where V, is the 2D Laplace operator acting in the transverse plane [x, y\. These equations may look even 
more bulky than the original Maxwell equations, but actually are much simpler for analysis. Indeed, 
eliminating the transverse components from these equations (or, even simpler, just plugging Eq. (99) 
into Eqs. (3) and keeping just their z-components), we may get a pair of self-consistent equations for the 
longitudinal components of the fields, 38 



2D Helmholtz 
equations for 
£ z and H z 

Wave vector 
component 
balance 



(Vf+* f 2 )£ z =0, (v 2 +kf)H z =0 



where k is still defined by Eq. (13), k = (sju) xll a>, and 



k 2 = k 2 -k 2 = (o 2 s/u-k] 



(7.101) 



(7.102) 



After distributions E z (x,y) and H z (x,y) have been found from these equations, they provide right-hand 
parts for rather simple, closed system of equations (100) for the transverse components of field vectors. 
Moreover, as we will see below, each of the following three types of solutions: 

(i) with E z = 0 and H z = 0 (called the transverse, or TEM waves), 

(ii) with E z = 0, but H z ^0 (called either TE waves or, more frequently, H modes), and 

(iii) with E z ^0, but H z = 0 (TM waves or E modes), 

has its own dispersion law and hence wave propagation velocity; as a result, these modes (the term 
meaning the field distribution pattern) may be considered separately. 

Let us start with the simplest, TEM waves with no longitudinal components of either field. For 
them, the top two equations of system (100) immediately give Eqs. (6) and (13), and k z = k. In plain 
English, this means that E = E t and H = H ; are proportional to each other and mutually perpendicular 
(just as in the plane wave) at each point of the cross-section, and that the TEM wave impedance Z = E/H 
and dispersion law co(k), and hence the propagation speed, are the same as in a plane wave in the 
material filling the waveguide. In particular, if s and ju are frequency-independent within a certain 

1/2 

frequency range, the dispersion law is linear, co = k/(sju) , and wave's speed does not depend on its 
frequency. For practical applications to telecommunications, this is a very important advantage of TEM 
waves over their TM and TE counterparts - to be discussed below. 

Unfortunately, such waves cannot propagate in every waveguide. In order to show this, let us 
have a look at the two last lines of Eqs. (100). For the TEM waves (E z = 0,H Z = 0, k z = k), they yield 

V,xE,=0, V,xH, =0, 

' (7.103) 
V,-E, =0, V t -H,=0. 

In the macroscopic approximation of the boundary conditions (i. e., neglecting the screening and skin 
depths), we have to require that the wave does not penetrate the walls, so that inside them, E = H = 0. 
Close to the wall but inside the waveguide, the normal component E„ of the electric field may be 



38 The wave equation presented in the form (98) is called the (in our particular case, 2D) Helmholtz equation, after 
H. von Helmholtz (1821-1894) - the mentor of H. Hertz and M. Planck, among many others. 
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different from zero, because surface charges may sustain its jump (see Sec. 2.1). Similarly, the 
tangential component H T oi the magnetic field may have a finite jump at the surface due to skin currents. 
However, the tangential component of the electric field and the normal component of magnetic field 
cannot experience such jump, and in order to have them vanishing inside the walls they have to equal 
zero near the walls inside the waveguide as well: 

E r =0, H n =0. (7.104) 

But the left columns of Eqs. (103) and (104) coincide with the formulation of the 2D boundary 
problem of electrostatics for the electric field induced by electric charges of the conducting walls, with 
the only difference that in our current case the value of s should be replaced with s(a>). Similarly, the 
right columns of those relations coincide with the formulation of the 2D boundary problem of 
magnetostatics for the magnetic field induced by currents in the walls, with ju = ju(a>). The only 
difference is that in our current case the magnetic fields should not penetrate inside the conductors. 

Now we immediately see that in waveguides with a singly-connected wall topology (see, e.g., 
the particular example shown in Fig. 17), TEM waves are impossible, because there is no way to create 
a finite electrostatic field inside a conductor with such cross-section. Fortunately, such fields (and hence 
TEM waves) are possible in structures with cross-sections consisting of two or more disconnected (dc- 
insulated) parts - see, e.g., Fig. 18. (Such structures are more frequently called the transmission lines 
rather than waveguides, the last term being mostly reserved for the lines with singly-connected cross- 
sections of the walls.) 




Fig. 7.18. Example of the cross-section 
of a transmission line that may support 
the TEM wave propagation. 



Now we can readily derive some "global" relations for each conductor, independent on the exact 
shape of its cross-section. Indeed, consider contour C drawn very close to the conductor's surface (see, 
e.g., the red dashed line in Fig. 18). First, we can consider it as a cross-section of a cylindrical Gaussian 
volume of certain length dz « X =2nlk. Using the generalized Gauss law (3.29), get 

§(E t )jr = ^, (7.105) 

c s 

where X m (not to be confused with wavelength M) is the linear density of electric charge of the 
conductor. Second, the same contour C may be used in the generalized Ampere law (5.131) to write 

f(H ( )> = / fu , (7.106) 

c 
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where I m is the total current flowing along the conductor (or rather its complex amplitude). But, as was 
mentioned above, in the TEM wave the ratio E t IH t of the field components participating in these two 

1/2 

integrals is constant and equal to Z = {/uls) , so that Eqs. (105)-(106) give the following simple relation 
between the "global" characteristics of the conductor: 



■'- x -t-wr--^ 



This relation may be also obtained by a different means; let me describe it, because it has an 
independent value. Let us consider a small segment dz « A = 2nlk of the conductor (limited by the red 
dashed line in Fig. 18) and apply the electric charge conservation law (4.1) to the instant values of the 
linear charge density and current. The cancellation of dz in both parts yields 

dA(z,t) dl(z,t) 



dt dz 



(7.108) 



(If we accept the sinusoidal waveform, Qxp{i(kz - cot)}, for both these variables, we immediately recover 
Eq. (107) for their complex amplitudes, so that the result just expresses the charge continuity law. 
However, Eq. (108) is valid for any waveform.) 

The global equation (108) may be made more specific in the case when the frequency 
dependence of a and ju is negligible, and the transmission line consists of just two isolated conductors 
(see, e.g., Fig. 18). In this case, in order to have the wave well localized in the space near the two 
conductors, we need a sufficiently fast convergence of its electric field at large distances. 39 For that, 
their linear charge densities for each value of z should be equal and opposite, and we can simply relate 
them to the potential difference V between the conductors: 

v^- c " <7109) 

where Co is the mutual capacitance of the conductors per unit length - that was repeatedly discussed in 
Chapter 2. Then Eq. (108) takes the form 

a^) = _^o 

0 dt dz 

Next, let us consider the contour shown with the red dashed line in Fig. 19 (which shows a cross- 
section of the transmission line by a plane containing the wave propagation axis z), and apply to it the 
Faraday induction law (6.3). Since the electric field is zero inside the conductors (in Fig. 19, on the 
horizontal parts of the contour), the total e.m.f. equals the difference of voltages V at the end of the 
segment dz, while the only source of the magnetic flux through the area limited by the contour are the 
(equal and opposite) currents ±7 in the conductors, we can use Eq. (5.70) to express it. As a result, 
canceling dz in both parts of the equation, we get 

mz^^jnz^ 

0 dt dz 



39 The alternative is to have a virtually plane wave, which propagates along the transmission line conductors, and 
whose fields are just slightly deformed in their vicinity. Such a wave cannot be "guided" by the conductors, and 
hardly deserves the name of a "wave in the waveguide". 
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where Lq is the mutual inductance of the conductors per unit length. The only difference between Lo and 
the dc mutual inductances discussed in Chapter 5 is that at the high frequencies we are analyzing now, 
L 0 should be calculated neglecting its penetration into the conductors. (In the dc case, we had the same 
situation for superconductor electrodes, within their crude, ideal-diamagnetic description.) 



I(z,t) 



V(z,t) 




Tt \ SI , 

i(z,t) H dz 

dz 



V(z,t) + — dz 

dz 



Fig. 7.19. Electric current, magnetic flux, and 
voltage in a two-conductor transmission line. 



The system of Eqs. (110) and (111) is frequently called the telegrapher's equations. Combined, 
they give for any "global" variable / (either V, or /, or A) a ID wave equation, 



dz 2 °°dt 2 



(7.112) 



which describes the dispersion-free TEM wave propagation. Again, this equation is only valid within the 
frequency range where the frequency dependence of both s and /u is negligible. If it is not so, the global 
approach may still be used for sinusoidal waves / = RQ[fafixp{i(kz - cot)}]. Repeating the above 
arguments, instead of Eqs. (1 10)-(1 1 1) we get algebraic equations 



aCjr m =k[ m , 



kV„ 



(7.113) 



in which L 0 cc ju and C 0 oc <?may now depend on frequency. 
Two linear equations (113) are consistent only if 



r r -* 2 =_L = 



CO' 



V 



L-aCa 

(7.114) product 



Besides the fact we have already known (that the TEM wave speed is the same as that of the plane 
wave), Eq. (114) gives us a result that I confess I have not emphasized enough in Chapter 5: the product 
LqCq does not depend on the shape or size of line's cross-section (provided that the magnetic field 
penetration into the conductors is negligible). Hence, if we have calculated the mutual capacitance Co of 
a system of two cylindrical conductors, the result immediately gives us their mutual inductance: Lo = 
s/u/Co. This relation stems from the fact that both the electric and magnetic fields may be expressed via 
the solution of a 2D Laplace equation for system's cross-section. 

With Eq. (114) satisfied, any of Eqs. (113) gives the same result for ratio 



V 




1/2 


7 CO 

— ~ — 

CO 


5 


r 

V^o j 





invanance 



Transmission 
(7.115) line's TEM 
Impedance 



that is called the transmission line's impedance. This parameter has the same dimensionality (in SI 
units, ohms) as the wave impedance (7), 
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Z =■ 



H. 



(7.116) 



but these parameters should not be confused, because Z w depends on cross-section's geometry, while Z 
does not. In particular, Z w is the only important parameter of a transmission line for matching with a 
lumped load circuit (Fig. 20) in the important case when both the cable cross-section's size and the 
load's linear dimensions are much smaller than the wavelength. (The ability of TEM lines to have such a 
small cross-section is their another important advantage.) Indeed, in this case we may consider the load 
in the quasistationary limit and write 

V a> (z 0 ) = Z L (a>)I ta (z 0 ) , (7.117) 

where Zl(co) is the (generally complex) impedance of the load. Taking V(z,t) and I(z,f) in the form 
similar to Eqs. (61) and (62), and writing two Kirchhoff s laws for point z = z 0 , we get for the reflection 
coefficient a result similar to Eq. (68): 

R = Z L (co)-Z w (711g) 
Z L (co) + Z w 

This formula shows that for the perfect matching (i.e. the total wave absorption in the load), load's 
impedance Z L (co) should be real and equal to Z w - but not necessarily to Z. 




Fig. 7.20. Transmission line 
impedance matching. 



As an example, let us consider one of the simplest (and the most important) transmission lines: 
the coaxial cable (Fig. 21). 40 




Fig. 7. 21. Cross-section of a coaxial cable with 
arbitrary (possibly, dispersive) dielectric filling. 



For this geometry, we already know expressions for both L 0 and Co, though they have to be 
modified for the dielectric constant and the magnetic field non-penetration into the conductors. After 
that modification, 



40 The coaxial cable was first patented by O. Heaviside in 1880. 
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, L 0 = - 1 — ln(b I a) . 

ln(£> / a) 2n 



So, the universal relation (114) is indeed valid! For cable's impedance (115), Eqs. (119) yield 



7 = 

w 



tL 

\£ ) 



ln(Z>/g) _ \n{b I a) 



2n 



2n 



Coaxial 
(7 119) cable's 
Co and L a 



(7.120) 



For standard TV antenna cables (such as RG-6/U, with bla ~ 3, s/so~ 2.2), Z w = 75 ohms, while 
for most computer component connections, cables with Z w = 50 ohms (such as RG-58/U) are prescribed 
by electronic engineering standards. Such cables are broadly used for transfer of electromagnetic waves 
with frequencies (limited mostly by cable attenuation; see Sec. 10 below) up to 1 GHz over distances of 
a few km, and up to -20 GHz on the tabletop scale (a few meters). 

Another important example of TEM transmission lines is the set of two parallel wires. In the 
form of a twisted pair (the twisting reduces parasitic radiation at line's bends), it allows 
communications, and in particular DSL Internet connections, at frequencies up to ~ 100 MHz, limited 
mostly by the mutual interference and parasitic radiation effects. 



7.7. H and E waves in metallic waveguides 

Let us now return to Eqs. (100) and explore the TE and TM waves - with, respectively, either H z 
or E z different from zero. At the first sight, they may seem more complex. However, equations (101), 
which determine the distribution of these longitudinal components over the cross-section, are just 2D 
Helmholtz equations for scalar functions. For simple cross-section geometries may be solved using the 
methods discussed for the Laplace equation in Chapter 2, in particular the variable separation. After the 
solution of such an equation has been found, the transverse components of the fields may be calculated 
by differentiation, using the simple formulas, 



E > =Mky t E z -kZ(n z xV,H z )l 



H 



W,+-(n^VA) 



(7.121) 



which follow from the two equations in the first line of Eqs. (100). 41 

In comparison with the electro- and magnetostatics problems, the only conceptually new feature 
of Eqs. (101), with appropriate boundary conditions, is that they form the so-called eigenproblems, with 
typically many solutions {eigenf unctions), each describing a specific wave mode, and corresponding to a 
specific eigenvalue of parameter k h . The good news here is that these values of k t are determined by 
this 2D boundary problem and hence do not depend on k z . As a result, the dispersion law oik z ) of each 
mode, that follows from the last form of Eq. (102), 




Universal 
(7.122) dispersion 
relation 



41 For that, one of these two linear equations should be first vector-multiplied by n z . Note that this approach could 
not be used to analyze TEM waves, because for them k, = 0,E Z = 0, H z = 0, and Eqs. (121) yield uncertainty. 
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is functionally the same as that of plane waves in a plasma (see Eq. (38), Fig. 6, and their discussion), 



1/2 

with the only differences that c is now replaced with v = 1 /(s/u) , the speed of plane (or any TEM) 
waves in the medium filling the waveguide, and co p is replaced with the so-called cutoff frequency 

co c =vk t , (7.123) 

specific for each mode. (As Eq. (101) implies, and as we will see from several examples below, k t has 
the order of l/a, where a is the characteristic dimension of waveguide's cross-section, so that the critical 
value of the free-space wavelength is of the order of a.) Below the cutoff frequency of each particular 
mode, it cannot propagate in the waveguide. 42 As a result, modes with the lowest values of co c present 
special practical interest, because the choice of the signal frequency co between two lowest values of 
cutoff frequency guarantees that the waves propagate in the form of only one mode, with the lowest k t . 
Such a choice allows to simplify the excitation of the desired mode by wave generators, and to avoid the 
parasitic transfer of electromagnetic wave energy to undesirable modes by (unavoidable) small 
inhomogeneities of the system. 

The boundary conditions for the Helmholtz equations (101) depend on the propagating wave 
type. For TM waves (i.e. E modes, with H z = 0 but E z ^ 0), in the macroscopic approximation the 
boundary condition E T =0 immediately gives 



J z\ C 



= 0, (7.124) 



where C is the contour limiting the conducting wall's cross-section. For TE waves (the //modes, with E z 
= 0 but H z ^ 0), the boundary condition is slightly less obvious and may be obtained using, for example, 
the second equation of system (100), vector-multiplied by n z . Indeed, for the component perpendicular 
to the conductor surface the equation gives 

^ z (H,) n -4(n z xE,) n =^. (7.125) 
Z on 

But the first term in the left-hand part of this equation must be zero on the wall surface, because of the 
second of Eqs. (103), while according to the first of Eqs. (103), vector E t in the second term cannot have 
a component tangential to the wall. As a result, the vector product in that term cannot have a normal 
component, so that the term should equal zero as well, and Eq. (125) is reduced to 

^|c=0. (7.126) 
on 

Let us see what does this approach give for a simple but practically important example of a 
metallic-wall waveguide with a rectangular cross-section. In this case it is natural to use the Cartesian 
coordinates shown in Fig. 22, so that both Eqs. (101) take the simple form 



42 An interesting recent twist in the ideas of electromagnetic metamaterials (mentioned in Sec. 5 above) is the so- 
called s-near-zero materials, designed to have the effective product s/u much lower than SqjUo within certain 
frequency ranges. Since at these frequencies the speed v (4) becomes much lower than c, the cutoff frequency 
(123) virtually vanishes. As a result, waves may "tunnel" through very narrow sections of metallic waveguides 
filled with such materials - see, e.g., M. Silveirinha and N. Engheta, Phys. Rev. Lett. 97, 157403 (2006). 
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>2 A 



d 1 . a- , 2 



0 / |^ z ' f° r TM waves, (7 127) 

li/ , for TE waves. 



From Chapter 2 we know that the most effective way of solution of such equations in a 
rectangular region is the variable separation, in which the general solution is represented as a sum of 
partial solutions of the type 

f = X(x)Y(y). (7.128) 
Plugging this expression into Eq. (127), and dividing each term by XY, we get the equation, 

l^ + I^ + )t ; = o, (7.129) 



X dx Y dy 

that should be satisfied for all values of x and y within the waveguide's interior. This is only possible if 
each term of the sum equals a constant. Taking the X-term and 7-term constants in the form (-k x ) and (- 
k y ), respectfully, and solving the corresponding ordinary differential equations, 43 for eigenfunction 
(128) we get 

/ = (c x cosk x x + s x sink x x\c cosk y + s sink y y), with k x +k 2 =kf, (7.130) 

where constants c and s should be found from the boundary conditions. Here the difference between the 
H modes and E modes pitches in. 




Fig. 7.22. Rectangular waveguide, and the 
transverse field distribution in the basic 
mode H w (schematically). 



For the former modes (TE waves), Eq. (130) is valid for H z , and we should use condition (126) 
on all metallic walls of the waveguide (x = 0 and a; y = 0 and b - see Fig. 22). As a result, we get very 
simple expressions for eigenfunctions and eigenvalues: 



(yj \ T T nnx nmy 
= H, cos cos 

V z J nm I 



a 



(7.131) 



k = 



nn 



K = 



(k t ) nm =(k 2 x+ k>Y 2 =x 



+ 



\o J 



1/2 



(7.132) 



43 Let me hope that the solution of equations of the type d 2 X I dx 1 + k x X = 0 does not present a problem for 

the reader, due to his or her prior experience with problems such as standing waves on a guitar string, 
wavefunctions in a flat ID quantum well, or (with the replacement x — > t) a classical harmonic oscillator. 
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where Hi is the longitudinal field amplitude, and n and m are two arbitrary integer numbers, besides that 
they cannot equal to zero simultaneously. (Otherwise, function H z {x,y) would be constant, so that, 
according to Eq. (121), the transverse components of the electric and magnetic field would equal zero. 
As a result, as the last two lines of Eqs. (100) show, the whole field would be zero for any k z ^ 0.) 
Assuming, for certainty, that a > b (as shown in Fig. 22), we see that the lowest eigenvalue of k t , and 
hence the lowest cutoff frequency (123), is achieved for the so-called H w mode with n = 1 and m = 0, 
and hence 



Basic 
mode's 
cutoff 



10 



n 
a 



(7.133) 



(thus confirming our prior estimate of k t ). 



Depending on the alb ratio, the second lowest k t and cutoff frequency belong to either the H n 
mode with n = 1 and m = 1 : 
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10 ' 



(7.134) 



or to the H20 mode with n = 2 and m = 0: 



2/T 

(^7)20 = = 2(«f)io 
a 



(7.135) 



These values become equal at alb = V3 « 1.7; in practical waveguides, the alb ratio is not too far from 
this value. For example, in the standard X-band waveguide WR90 with a « 2.3 cm (f c = cojln « 6.5 
GHz), 6 « 1.0 cm. 

Now let us have a fast look at alternative TM waves (E modes). For them, we may still should 
use the general solution (130) with / = E z , but now with boundary condition (124). This gives us 
eigenfunctions 



( \ nnx . nmy 
\E\ = E 1 sin sin 

\ z Jnm I 1 



(7.136) 



and the same eigenvalue spectrum (132) as for the //modes. However, now neither n nor m can be equal 
to zero; otherwise Eq. (136) would give the trivial solution E z {x,y) = 0. Hence the lowest cutoff 
frequency of TM waves is provided by the so-called E n mode with n =1, m = 1, and the eigenvalue is 
again given by Eq. (134). 

Thus the basic (or "fundamental") Hio mode is certainly the most important wave in rectangular 
waveguides; let us have a better look at its field distribution. Plugging the corresponding solution (131) 
with n = 1 and m = 0 into the general Eqs. (121), we easily get 



{HX^-i — H^m — , (H y ) w =0, 
k a 

(E x )w = °> (^v)io = i—ZH, sin—. 

71 a 



(7.137) 
(7.138) 



This field distribution is (schematically) shown in Fig. 22. Neither of the fields depends on the vertical 
coordinate - which is very convenient, in particular, for microwave experiments with small samples. 
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The electric field has only one (vertical) component that vanishes at the side walls and reaches 
maximum at waveguide's center; its field lines are straight, starting and ending on wall surface charges 
(whose distribution propagates along the waveguide together with the wave). In contrast, the magnetic 
field has two nonvanishing components (H x and H z ), and its field lines are shaped as horizontal loops 
wrapped around the electric field maxima. 

An important question is whether the Hiq wave may be usefully characterized by a unique 
impedance introduced similar to Zw of the TEM modes - see Eq. (115). The answer is not, because the 
main value of Z w is a convenient description of the impedance matching of the transmission line with a 
lumped load - see Fig. 20 and Eq. (118). As was discussed above, such simple description is possible 
(i.e., does not depend on the exact geometry of the connection) only if both dimensions of line's cross- 
section are much less than X. But for the H\§ wave (and more generally, any non-TEM mode) this is 
impossible - see, e.g., Eq. (129): its lowest frequency corresponds to the TEM wavelength Amax = 
2^)min = 2n/(k,) w = 2a. 44 

Now let us consider metallic waveguides with round cross-section (Fig. 23a). In this single- 
connected geometry, again, the TEM waves are impossible, while for the analysis of H modes and E 
modes the polar coordinates {p,q>} are most natural. In these coordinates, the 2D Helmholtz equation 
(101) takes the form 



P dp 



+ ■ 



1 d 2 



dp) p 8<p 



■ + kt 



f = 0, f = 



H. 



for TM waves, 
for TE waves. 



(7.139) 



Separating the variables as /= %Xp)-ft<p), we get 



1 d 



p^ dp 



dp 



+ ■ 



1 d 2 f 
p 2 f dp 2 



+ k 2 = 0. 



(7.140) 




(b) 




Fig. 7.23. (a) Metallic and (b) dielectric 
waveguides with circular cross-sections. 



But this is exactly the Eq. (2.127) that was studied in the context of electrostatics, just with a 
replacement of notation: y— > k t . So we already know that in order to have 2^-periodic functions ?{<p), 
and finite values 5^0) (which are necessary for our current case - see Fig. 23a), the general solution is 



44 The reader is encouraged to find a simple interpretation of this equality. 
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given by Eq. (2.136), i.e. the eigenfunctions may be expressed via integer-order Bessel functions of the 
first kind: 45 

/„= const xJ n (K m P)e in<P , (7.141) 

with eigenvalues k nm of the transverse wave number k t to be determined from appropriate boundary 
conditions. 

As for the rectangular waveguide, let us start from H modes (f = H z ). Then the boundary 
condition on the wall surface (p = R) is given by Eq. (126), which, for solution (141), takes the form 

= 0 , £ s kR. (7.142) 

ag 

This means that eigenvalues of Eq. (139) are 

K=k nm =^, (7.143) 

where E,' nm is the m th root of function cU n {g)ldq'. The approximate values of these roots for several 
lowest n and m may be read out from the plots in Fig. 2.16; their more accurate values are presented in 
Table 1 below. 



Table 7.1. Roots i;' nm of function dJ„(g)/d<^ for a few 
values of Bessel function's index n and root's number m. 





m = 1 


2 


3 


n = 0 


3.83171 


7.015587 


10.1735 


1 


1.84118 


5.33144 


8.53632 


2 


3.05424 


6.70613 


9.96947 


3 


4.20119 


8.01524 


11.34592 



It shows, in particular, that the lowest of the roots is %\\ « 1.84. Thus, a bit counter-intuitively, 
the basic mode, providing the lowest cutoff frequency co c = vk nm , is H n corresponding to n = 1 rather 
than n = 0: 46 



( , 

V K 



(7.144) 



with the transverse wave vector k t = k\\ = q~\\IR « 1.84/i?, and hence the cutoff frequency corresponding 
to the TEM wavelength = 2nlk\\ « 3.41 R. Thus the ratio of Ama* to the waveguide diameter 2R is 



45 In Chapter 2, it was natural to take the angular dependence in the sin-cos form, which is equivalent to adding a 
similar term with n — > -n to the right-hand part of Eq. (141). However, since the functions / we are discussing 
now are already complex, it is easier to do calculations in the exponential form - though it is vital to restore real 
fields before calculating any of their nonlinear forms, e.g., the wave power. 

46 The lowest root of Eq. (142) with n = 0, i.e. £'oo, equals 0, and would yield k = 0 and hence a constant field H z , 
which, according to the first of Eqs. (121), would give vanishing electric field. 
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about 1.7, i.e. is close to the ratio X^Ja = 2 for the rectangular waveguide. The origin of this proximity 
is clear from Fig. 24, which shows the transverse field distribution in the Hn mode. (It may be readily 
calculated from Eqs. (121) with E z = 0 and H z given by the real part of Eq. (144).) 




Fig. 7.24. Transverse field components in the 
basic Hi ] mode of a metallic, circular waveguide 
(schematically). 



One can see that the field structure is actually very similar to that of the basic mode in the 
rectangular waveguide, shown in Fig. 22, despite the different nomenclature (due to the different type of 
used coordinates). However, note the arbitrary argument of complex constant/// in Eq. (144), indicating 
that in circular waveguides the transverse field polarization is arbitrary. For some practical applications, 
the degeneracy of these "quasi-linearly-polarized" waves creates problems; they may be avoided by 
using waves with circular polarization. 47 

As Table 1 shows, the next lowest H mode is H21, for which k t = fei = %'i\IR ~ 3.05/7?, almost 
twice larger than that of the basic mode, and only then comes the first mode with no angular dependence 
of the any field, H ou with k, = k m = £' 0l /R « 3.83/i? 48 

For the E modes, we may still use Eq. (141) (with /= E z ), but with boundary condition (124) at p 
= R. This gives the following equation for the problem eigenvalues: 

J n (k nm R) = 0, i.e. K m =^f, (7-145) 

where % nm is the m-th root of function J n (^) - see Table 2.1. The table shows that the lowest k t equals to 
%o\IR ~ 2.405/7?. Hence the corresponding mode (Eoi), with 

£ z =iV/ 0 (<r 01 ^), (7.146) 

has the second lowest cutoff frequency, approximately 30% higher than that of the basic mode H n . 

Finally, let us discuss one more topic of general importance - the number N of electromagnetic 
modes that may propagate in a waveguide within a certain range of relatively large frequencies co » a> c . 
This is easy to calculate for a rectangular waveguide, with its simple expressions (132) for the 
eigenvalues of {k x , k y ) . Indeed, these expressions describe a rectangular mesh on the [k x , k y ] plane, so 



47 Actually, Eq. (144) does describe a circularly polarized wave, while the real and imaginary parts of this 
expression describing two mutually perpendicular quasi-linearly-polarized waves. 

48 Electric field lines in the H 0 \ mode (as well as all higher H 0m modes) are directed straight from the axis to the 
walls, reminding those of TEM waves in the coaxial cable. Due to this property, these modes provide, at a>» co c , 
much lower power losses (see Sec. 1 0 below) than the fundamental Hi 1 mode, and are sometimes used in practice, 
despite all inconveniences of working in the multimode frequency range. 
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that each point corresponds to the plane area AAk = (nla)(7ilb), and the number of modes in a large k- 
plane area At » AAk isN= At/AAk = abAkln = AAkln, where A is the waveguide's cross-section area. 49 
However, it is frequently more convenient to discuss transverse wave vectors of arbitrary direction, 
i.e. with arbitrary sign their components k x and k y . Taking into account that the opposite values of each 
component actually give the same wave, the actual number of different modes of each type (E or H) is a 
factor of 4 lower than was calculated above. This means that the number of modes of both types is 

N = 2^ T . (7.147) 
ilnf 

It may be convincingly argued that this mode counting rule is valid for waveguides with cross- 
section of any shape, and any boundary conditions on the walls, provided that N» 1. 



7.8. Dielectric waveguides and optical fibers 

Now let us discuss electromagnetic wave propagation in dielectric waveguides. The simplest, 
step-index waveguide (Figs. 23, 25) consists of an inner core and an outer shell (in the optical fiber 
technology, called cladding) with a higher wave propagation speed, i.e. lower index of refraction: 

v + >v_, i.Q.k + <k_, s +J u + <s_ju_. (7.148) 

(In most cases the difference is achieved due to that in the dielectric constant, e. < s+, while magnetically 
both materials are almost passive: //_ ~ /Jo, and I will assume that in my narrative.) The idea of the 
waveguide operation may be readily understood in the case when wavelength A, is much smaller than the 
characteristic size R of core's cross-section. If this "geometric optics" limit, at the distances of the order 
of X from the core-to-cladding interface, which determines the wave reflection, we can consider the 
interface as a plane. As we know from Sec. 5, if angle 0 of plane wave incidence on such an interface is 
larger than the critical value 6 C specified by Eq. (82), the wave is totally reflected. As a result, the waves 
launched into the fiber core at such "grazing" angles, propagate inside the core, repeatedly reflected 
from the cladding - see Fig. 25. 



"cladding" £\ 




Fig. 7.25. Wave propagation 



in a thick optical fiber. 



The most important type of dielectric waveguides are optical fibers. 50 Due to a heroic 
technological effort, in about three decades starting from the mid-1960s, the attenuation of glass fibers 



49 This formula ignores the fact that, according to our analysis, some modes (with n = 0 and m = 0 for H modes, 
and n = 0 or m = 0 for E modes, are forbidden. However, for N » 1, the associated corrections of Eq. (91) are 
negligible. 

50 For a comprehensive description of this vital technology see, e.g., A. Yariv and P. Yeh, Photonics, 6 th ed., 
Oxford U. Press, 2007. 
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has been decreased from the values of the order of 20 db/km (typical for the window glass) to the 
fantastically low values about 0.2 db/km (meaning a virtually perfect transparency of 10-km-long fiber 
segments!) - see Fig. 26a. It is remarkable that this ultralow power loss may be combined with an 
extremely low frequency dispersion, especially for near-infrared waves (Fig. 26b). In conjunction with 
the development of inexpensive erbium-based quantum amplifiers, this breakthrough has enabled inter- 
continental (undersea), broadband 51 optical cables, which are the backbone of all the modern 
telecommunication infrastructure. The only bad news is that these breakthroughs were achieved for just 
one kind of materials (silica-based glasses) 52 within a very narrow range of their chemical composition. 
As a result, the dielectric constants sJsq of the cladding and core of practical optical fibers are both 
close to 2.2 (n± « 1.5) and are very close to each other, so that the relative difference of the refraction 
indices, 

^ n -^, £ -^, (7.149) 
n_ 2s ± 

is typically below 0.5%, thus limiting the fiber bandwidth - see below. 




Fig. 7.26. (a) Attenuation and (b) dispersion of representative single-mode optical fibers. 
(Adapted, respectively, from http ://olson-technology . com and http://www.timbercon.com .) 



Practical optical fibers come in two flavors: multi-mode and single-mode ones. Multi-mode 
fibers, used for transfer of high optical power (up to as much as -10 watts), have relatively thick cores, 
with a diameter 2R of the order of 50 um, much larger than X ~ 1 um. In this case, the "geometric- 
optics" picture of the wave propagation discussed above is quantitatively correct, and we may use it to 
calculate the number of quasi-plane-wave modes that may propagate in the fiber. Indeed, for the 
complementary angle (Fig. 25) 



51 Each frequency band shown in Fig. 26a, at a typical signal-to-noise ratio S/N> 10 5 (50 db), corresponds to the 
Shannon bandwidth Aflog 2 (S/N) exceeding 10 14 bits per second, five orders of magnitude (!) higher than that of a 
modern Ethernet cable. And this is only per one fiber; an optical cable may have hundreds of them. 

52 The silica-based fibers were suggested in 1966 by C. Kao (the 2009 Nobel Prize in physics), but the very idea 
of using optical fibers for communications may be traced back to at least the 1963 work by J. Nishizawa. 
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2 



Eq. (82) gives the propagation condition 



cost9 > 



1-A 



(7.150) 



(7.151) 



For the case A « 1, when the incidence angles 6 > 9 C of all propagating waves are close to nil, and 
hence the complimentary angles are small, we can keep only two first terms in the Taylor expansion of 
the left-hand part of Eq. (151) and get 



(7.152) 



Number 
of modes 



Even for the higher-end value A = 0.005, this critical angle is only ~0.1 radian, i.e. close to 5°. Due to 
this smallness, we can approximate the maximum transverse component of the wave vector as 

(*,)«. = ^in^ - JWL. * V2M, (7.153) 
and use Eq. (147) to calculate number N of propagating modes: 



(7.154) 




For typical values k = 0.73xl0 7 m" 1 (corresponding to the free-space wavelength A 0 = nA = 2mlk ~ 1.3 
|um), R = 25 |um, and A = 0.005, this formula gives N» 150. 

The largest problem with using multi-mode fibers for communications is their high geometric 
dispersion, i.e. the difference of the mode propagation speed, which is usually characterized in terms of 
the signal delay time difference (traditionally measured in picoseconds per kilometer) between the 
fastest and the slowest mode. Within the geometric optics approximation, the difference of time delays 
of the fastest mode (with k z = k) and the slowest mode (with k z = k shift) at distance / is 



At = A 



(A 




r kA 


= A 






v to ) 



= -Ak, 

CO 



1 



-J 



v 



(7.155) 



For the example considered above, the TEM wave speed v = cln « 2x10 m/s, and the geometric 
dispersion At/ 1 is close to 25 ps/m, i.e. 25,000 ps/km. (This means, for example, that a 1-ns pulse, being 
distributed between the modes, would spread to a ~25-ns pulse after passing a just 1-km fiber segment.) 
Such disastrous dispersion should be compared with chromatic dispersion that is due to the frequency 
dependence of s±, and has the steepness {dtldX)ll of the order of 10 ps/km-nm (see the solid pink line in 
Fig. 26b). One can see that through the whole frequency band (dA -100 nm) the total chromatic 
dispersion dill is of the order of only 1,000 ps/km. 

Due to the large geometric dispersion, the multimode fibers are used for signal transfer over only 
short distances (~ 100 m), while long-range communications are based on single-mode fibers, with thin 
cores (typically with diameters 2R ~ 5 urn, i. e. of the order of AIA m ). For such structures, Eq. (154) 
yields N ~ 1 , but in this case the geometric optics approximation is not quantitatively valid, and we 
should get back to the Maxwell equations. In particular, this analysis should take into an explicit 
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account the evanescent wave propagating in the cladding, because its penetration depth may be 
comparable with i?. 53 

Since the cross-section of an optical fiber is not uniform and lacks metallic conductors, the 
Maxwell equations cannot be exactly satisfied with either a TEM, or a TE, or a TM solutions. Instead, 
the fibers can carry so-called HE and EH modes, with both fields having longitudinal components 
simultaneously. In such modes, both E z and H z inside the core (p < R) have the form similar to Eq. 
(141): 



/_ = f l J n (k t p)e in(p , with k 2 = k 2 - k 2 z > 0, 



k 2 = C0 2 £_jU_ . 



(7.156) 



where amplitudes fi (i.e., E\ and H) may be complex to account for the possible angular shift between 
these components. On the other hand, for the evanescent wave in the cladding, we may rewrite Eq. (102) 
as 



(v 2 - k 2 )f + = 0, with k 2 = k] -k 2 + >0 , k 2 + = (o 2 s + n+ 
Figure 27 illustrates the relation between k h K t , k z , and k±, note that the following sum, 



k 2 +k 2 =co 2 (s_-s + )ju Q , 



(7.157) 



Universal 

n 1 ssr» relation 

t^/.OoJ between 
k, and k, 



is fixed (at fixed frequency) and, for typical fibers, very small {~2Ak « k ). By the way, Fig. 27 shows 
that neither of k t and K t can be larger than co[(£. - s+)jUo\ m = kA l/2 . In particular, this means that the depth 

1/2 1/2 

8= II K t of wave penetration into the cladding is at least 1/kA = A/2ttA » AJ2m This is why the 
cladding layers in practical optical fibers are made as thick as ~50 um, so that only a negligibly small 
tail of this evanescent wave field reaches their outer surfaces. 



k_ - k + = co 2 (s_ - £ + )jU D 



k 2 



Fig. 7.27. Relation between the transverse 
exponents k, and K t for waves in optical fibers. 



In the polar coordinates, Eq. (157) becomes 



]_d_ 
pdp 



d 



+ ■ 



l d 2 



dp) p dcp^ 



f + =o.. 



(7.159) 



instead of Eq. (139). From Sec. 2.5 we know that the eigenfunctions of Eq. (159) are the products of the 
angular factor exp{m^} by a linear combination of the modified Bessel functions /„ and K n , shown in 



53 I believe that the following calculation is important - both for practice, and as a good example of Maxwell 
theory application. However, its results will not be used in the following sections/chapters of the course, so that if 
the reader is not interested in this topic, he or she may safely jump to the beginning Sec. 9. 
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Fig. 2.20, now of argument K t p. In our case, the fields should vanish at p — » oo, so that only the latter 
functions (of the second kind) can participate: 



f + cc K n (K t p)e m<p 



(7.160) 



Now we have to reconcile Eqs. (156) and (160), using the boundary conditions at p = R for both 
longitudinal and transverse components of both fields, with the latter fields first calculated from using 
Eqs. (121). Such a conceptually simple, but a bit bulky calculation (which I am leaving for reader's 
exercise :-), yields a system of two linear, homogeneous equations for complex amplitudes E/ and Hi, 
that are compatible if 



k, J„ 



+ - 



k 2 K. 



v t 



K, K 



" J 



k, J. 



+ ■ 



1 K 



K, K 



n J 
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■ + ■ 



t J 



1 1 



k 2 



+ - 



K 



(7.161) 



< J 



where prime means the derivative of each function over its full argument: k t p for J n , and K t p for K„. 

For any given frequency co, the system of Eqs. (158) and (161) determines the values of k t and K t , 
and hence k z . Actually, for any n > 0, this system provides two different solutions: one corresponding to 
the so-called HE wave with larger ratio EJH Z , and the EH wave, with a smaller value of that ratio. For 
angular-symmetric modes with n = 0 (for whom we might naively expect the lowest cutoff frequency), 
the equations may be satisfied by fields having just one finite longitudinal component (either E z or H z ), 
and the HE modes are the usual E waves, while the EH modes are the H waves. For the H modes, the 
characteristic equation is reduced to the requirement that the second parentheses in the left-hand part of 
Eq. (161) equals to zero. Using the fact that J'o = - J\, and K' 0 = - K\, this equation may be rewritten as 



1 J x {k t R) 1 K x (k,R) 



k t J 0 (k t R) k, K 0 (K t R) 



(7.162) 



Using the simple relation between k t and K t given by Eq. (158), we may plot both parts of Eq. 
(162) as a function of the same argument, say, ^=k t R- see Fig. 28. 




^02 "rl 



4 = k t R 



i 



10 



Fig. 7.28. Two sides of the characteristic 
equation (162), plotted as a function of k,R, 
for two values of its dimensionless 
parameter: V = 8 (blue line) and V = 3 (red 
line). Note that according to Eq. (158), the 
argument of functions K 0 and K\ is just 



K t R = \V 1 -(k t K) 1 \ m = (V 1 - 



The right-hand part of Eq. (162) depends not only on ^but also on the dimensionless parameter 
V defined as the normalized right-hand part of Eq. (158): 
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V = co'(e_-s + )ju 0 R' *2Ak±R\ (7.163) 

(According to Eq. (155), if V» 1, it gives the doubled number of the fiber modes - the conclusion 
confirmed by Fig. 28, taking into account that it describes only the H modes.) Since the ratio K\IKq is 
positive for all values of their argument (see, e.g., the right panel of Fig. 2.20), the right-hand part of Eq. 
(162) is always negative, so that the equation may have solutions only in the intervals where the ratio 
JiIJq is negative, i.e. at 

£ 01 £ 02 <V?<£ 2 ,..., (7.164) 

where E, nm is the m-th zero of function J„(^) - see Table 2.1. The right-hand part of the characteristic 
equation diverges at K t R — > 0, i.e. at k t R — > V, so that no solutions are possible if V is below the critical 
value V c = » 2.405. At this cutoff point, Eq. (163) yields k±.« ^ 0 \/R(2A) m . Hence, the cutoff 
frequency for the lowest H mode corresponds to the TEM wavelength 

^=^(2A) 1/2 *3.7*A-. (7.165) 

For typical parameters A = 0.005 and R = 2.5 |um, this result yields Amax ~ 0.65 |um, corresponding to the 
free-space wavelength A$ ~ 1 \im. A similar analysis of the first parentheses in the left-hand part of Eq. 
(161) shows that at A — > 0, the cutoff frequency for the E modes is similar. 

This situation may look exactly like that in metallic waveguides, with no waves possible at 
frequencies below co c , but this is not so. The basic reason for the difference is that in metallic 
waveguides, the approach to co c results in the divergence of the longitudinal wavelength X z = 2n/k z . On 
the contrary, in dielectric waveguides this approach leaves Xz finite (k z — » k+). Due to this difference, a 
certain linear superposition of HE and EH modes with n = 1 can propagate at frequencies well below the 
cutoff frequency for n = 0, which we have just calculated. 54 This mode, in the limit s+ ~ s. (i.e. A « 1) 
allows a very interesting and simple description using the Cartesian (rather than polar) components of 
the fields, but still expressed as functions of polar coordinates p and (p. The reason is that this mode is 
very close to a linearly polarized TEM wave. (Due to this reason, this mode is referred to as LPq\.) 

Let us select axis x parallel to the transverse component of the magnetic field vector, so that 
E x \p=o = 0, but E y \p=o * 0, and H x \p=o ^0, but H y \p=o = 0. The only suitable solutions of the 2D Helmholtz 
equation (that should be obeyed not only by z-components of the field, but also their x- and y- 
components) are proportional to Jo(k t p), with zero coefficients for E x and H y : 



E x =0, E=E 0 J 0 (k t p), H x =H 0 J 0 (k t p), H=0, for p<R. 



LP m mode's 
(7.166) fields 

distribution 



Now we can readily calculate the longitudinal components, using the last two equations of Eqs. (100): 

1 dE k 1 dH k 

-i— L E 0 J l (k t p)smtp, H z = — ; - = -i— L H 0 J 1 (k t p)costp, (7.167) 



-ik z dy k z ' —ik z dx k z 

where I have used mathematical identities J'o = - J\, dp/dx = xlp = cosg), and dp/dy = yip = sirup. As a 
sanity check, we see that the longitudinal component or each field is a (legitimate!) eigenfunction of the 



54 This fact becomes less surprising if we recall that in the circular metallic waveguide, discussed in Sec. 7, the 
lowest mode (H u , Fig. 23) also corresponded to n = 1 rather than n = 0. 
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type (141) with n = 1. Note also that if k t « k z (this relation is always true if A « 1 - see Fig. 27), the 
longitudinal components of the fields are much smaller than their transverse counterparts, so that the 
wave is indeed very close to the TEM one. Because of that, the ratio of the electric and magnetic field 
amplitudes is also close to that in the TEM wave: Eo/Hq « Z_ « Z+. 

Now in order to ensure the continuity of the fields at the core-to-cladding interface (p = R), we 
need to have a similar angular dependence of these components at p > R. The longitudinal components 
of the fields are tangential to the interface and thus should be continuous. Using the solutions similar to 
Eq. (160) with n= 1, we get 

k, JAk,R) k, JAk,R) 

E z = -i - L ') t ' E 0 K, ( Kt p) sin <p, H z = l ) ' J H,K, ( Kt p) cos <p, for p > R. (7.168) 

k z K,(k,R) k z K l (k,R) 

For the transverse components, we should require the continuity of the normal magnetic field /jH n _ for 
our simple field structure equal to just juH x cos(p, of the tangential electric field E T = E y sm(p, and of the 
normal component of D„ = sE n = sEyCOsp. Using the fact that ju. = ju+ = jUo, and s+ « s., 55 we can satisfy 
these conditions with the following solutions 



K 0 (K t p) tHh K 0 (k t p) 



E ,=°> E y = J, [ ^ E,K,(K t p), H^^^H.K^p), H y =0, fovp>R. (7.169) 



From here, we can calculate components from E z and H z , using the same approach as for p < R: 

1 dE y K t J 0 (k,R) . 

E. = — = —i — — — - — E Q KAK t p)sm<p, 

-ik z dy k z K 0 (rc t R) 

H z = 1 3H * = -A Joik,R) H Q K x {K t p)cos<p, forp>i?. 
-ik z dx k z K 0 (K t R) 



(7.170) 



We see that this equation provides the same functional dependence of the fields as Eqs. (166), i.e. the 
internal and external fields are compatible, but their amplitudes coincide only if 



LP 0 i mode's 
characteristic 
equation 



k J l {k t R) _ K K x (K t R) 
' J 0 (k t R) 'K 0 (K t R) 



(7.171) 



This characteristic equation (which may be also derived from Eq. (161) with n = 1 in the limit 
A— > 0) looks close to Eq. (162), but functionally is much different from it - see Fig. 29. Indeed, its right- 
hand part is always positive, and the left-hand part tends to zero at k t R — > 0. Due to this, Eq. (171) may 
have a solution for arbitrary small values of parameter V, defined by Eq. (159), i.e. for arbitrary low 
frequencies. This is why this mode is used in practical single-mode fibers: there are no other modes that 
can propagate at co < co c , so that the geometric dispersion problem is avoided. 

It is easy to use the Bessel function approximations given by the first term of the expansion 
(2.132) and also Eq. (2.157) to show that in the limit F— » 0 (i.e. V « 1), k,R tends to zero much faster 



55 This is the core assumption of this approximate theory which accounts only for the most important effect of the 

2 2 2 2 2 

difference of dielectric constants s+ and s:. the opposite signs of the differences (k+ -k z ) = k, and (k. - k z ) = - 
k}. For more discussion of accuracy of this approximation and some exact results, the interested reader may be 
referred, for example, either to the monograph by A. Snyder and D. Love, Optical Waveguide Theory, Chapman 
and Hill, 1983, or to Chapter 3 and Appendix B in the monograph by Yariv and Yeh, that was cited above. 
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than k t R ~ V: K t R — > 2exp{-l/F} « V. This means that the scale p c = l//c t of the radial distribution of the 
LPqi wave's fields in the cladding becomes very large. In this limit, this mode may be interpreted as a 
virtually TEM wave propagating in the cladding, just slightly deformed (and guided) by the fiber core. 
The drawback of this feature is that it requires very thick cladding, in order to avoid energy losses in 
outer ("buffer" and "jacket") layers that defend the silica components from the elements, but lack their 
low optical absorption. Due to this reason, the core radius is usually selected so that parameter V is just 
slightly less than the critical value V c = £01 ~ 2.4 for higher modes, thus ensuring the single-mode 
operation and eliminating the geometric dispersion problem. 



1U 




Fig. 7.29. Two sides of the 
characteristic equation ( 1 67) for the 
LP 0l mode, plotted as a function of 
k t R, for two values of the 
dimensionless parameter: V = 8 
(blue line) and V= 1 (red line). 



Z = k,R 



In order to reduce the field spread into the cladding, the step-index fibers considered above may 
be replaced with graded-index fibers whose the dielectric constant e r is gradually and slowly decreased 
from the center to the periphery. Keeping only the main two terms in the Taylor expansion of the 
function s{p) at p = 0, we may approximate such reduction as 



s{p)«s{Q,)\\-^p 2 

V 1 J 



(7.172) 



where C — ~ i(d si dp )/e]p=o is a positive constant characterizing the fiber composition gradient. 56 
Moreover, if this constant is sufficiently small (k 2 £« 1), the field distribution across the fiber's cross- 
section may be described by the same 2D Helmholtz equation, but with the space-dependent transverse 
wave vector: 57 



[v 2 +kf{p)]f = 0, where k 2 (p) = k\p)-k) = co 2 s(p) Mo -k] = kf(0) 



2 



(7.173) 



Surprisingly for such axially-symmetric problem, because of its special dependence on the radius, this 
equation may be most readily solved in Cartesian coordinates. Indeed, rewriting it as 



56 For an axially-symmetric fiber with a smooth function s(p), the first derivative dsldp should vanish at p - 0. 

57 Such approach is invalid at arbitrary (large) £ Indeed, in the macroscopic Maxwell equations, s(r) is under the 
differentiation sign, and the exact Helmholtz-type equations for fields have additional terms containing Vs. 
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c) 2 ci 2 ( 



8x 2 8y 2 

and separating variables as f= X(x)Y(y), we get 

d 2 X d 2 Y" 



2 2 , 



f = 0, 



(7.174) 



c 



- -r + — r + ^(°) l--x 2 -^y 2 
dX 2 dY 2 ' \ 2 2 J 



0, 



so that functions X and Y obey the same similar differential equation, 



dx 2 



+k: 



l- g -x 2 



f = 0, f = 



Y. 



with the separation constants satisfying the following relation: 

k 2 +k 2 =kf(0) = co 2 s(0)ju 0 -k 2 , 



(7.175) 



(7.176) 



(7.177) 



Equation (176) is well known from the elementary quantum mechanics, because the Schrodinger 
equation for the perhaps most important quantum system, a ID harmonic oscillator, may be rewritten in 
this form. Their eigenvalues are described by a simple formula 



V2) 



1/2 



(2n + l), (k y ) m = 



K2) 



1/2 



(2m + 1), n,m = 0,1,2,. 



(7.178) 



but eigenfunctions X n (x) and Y m (y) have to be expressed via not quite elementary functions - the Hermite 
polynomials. 58 For our purposes, however, the lowest eigenfunctions X 0 (x) and Y 0 (y) are sufficient, 
because they correspond to the lowest k x , y and hence the lowest cutoff frequency: 



co 2 c s{0)^={k 2 x ) Q +{k 2 ) 0 =g 



(7.179) 



(Note that at ^— > 0, the cutoff frequency tends to zero, as it should be for a wave in a uniform medium.) 
The eigenfunctions corresponding to the lowest eigenvalues are simple: 



f Q (x) = const x exp< - 



gx 



(7.180) 



so that the field distribution follows the Gaussian ("bell curve") function 



/ 0 (p) = / 0 (0)exp 



g(x 2 +y 2 ) 



= /o(0)exp 



(7.181) 



This is the so-called Gaussian beam, very convenient for some applications. Still, the graded- 
index fibers have higher attenuation than their step-index counterparts, and are not used as broadly. 



58 See, e.g., QM Sec. 2.6. 
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1.9. Resonators 

Resonators are the distributed oscillators, i.e. structures that may sustain standing waves (in 
electrodynamics, oscillations of the electric and magnetic field at each point) even without a source, 
until the oscillation amplitude slowly decreases in time due to unavoidable energy losses. If the 
resonator quality (described by the so-called Q-factor, which will be defined and discussed in the next 
section) is high, this decay takes many oscillation periods. Alternatively, high-g resonators may sustain 
oscillating fields permanently, if fed with a relatively weak incident wave. 

Conceptually the simplest resonator is the Fabry-Perot interferometer 59 that may be obtained by 
placing two well-conducting planes parallel to each other. 60 Indeed, in Sec. 1 we have seen that if a 
plane wave is normally incident on such a "perfect mirror", located at z = 0, its reflection, at negligible 
skin depth, results in a standing wave described by Eq. (61) - that may be rewritten as 

E(z, t) = vJ^E a e~ iat+ial2 )sin kz . (7.1 82) 

Hence the wave would not change if we had suddenly put the second mirror (isolating the segment of 
length / from the external wave source) at any position z = / with sin kl = 0, i.e. 

kl = px, where p = \,2,.... (7.183) 

This condition, which also determines the eigen- (or resonance) frequency spectrum of the resonator of 
fixed length /, 



m 1 



v p =vk p =—p, v = -—, (7.184) 



has a simple physical sense: the resonator length / equals exactly p half-waves of frequency co p . Though 
this is all very simple, please note a considerable change of philosophy from what we have been doing 
in the previous sections: the main task in resonator analysis is finding its eigenfrequencies a> p that are 
now determined by the system geometry rather than by an external wave source. 

Before we move to more complex resonators, let us use Eq. (62) to present the magnetic field in 
the Fabry-Perot interferometer: 



H(z,t) = Re 



V z 



cosfe . (7.185) 



Expressions (182) and (185) show that in contrast to traveling waves, each field of the standing wave 
changes simultaneously (proportionately) at all points of the Fabry-Perot resonator, turning to zero 
everywhere twice a period. At those instants the electric field energy of the resonator vanishes, but the 
total energy stays constant, because the magnetic field oscillates (also simultaneously at all points) with 
the phase shift nil. Such behavior is typical for all electromagnetic resonators. 

Another, more technical remark is that we can readily get the same results (182)-(185) by 
solving the Maxwell equations from the scratch. For example, we already know that in the absence of 
dispersion, losses, and sources, they are reduced to wave equations (3) for any field components. For the 



59 The device is named after its inventors, M. Fabry and A. Perot; and is also called the Fabry-Perot etalon 
(meaning "gauge"), because of its initial usage for the light wavelength measurement. 

60 The resonators formed by well conducting (usually, metallic) walls are frequently called the resonant cavities. 
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Fabry-Perot resonator's analysis, we can use their ID form, say, for the transverse component of the 
electric field: 



i a 2 ^ 



dz 2 v 2 dt 2 



E = 0, (7.186) 



and solve it as a part of an eigenvalue problem with the corresponding boundary conditions. Indeed, 
separating time and space variables as E(z, t) = Z(z)7{t), we get 

Xi ' Z 1 lrf2r =0. (7.187) 



z dz 2 v 2 r dt 

Calling the separation constant k 2 , we get two similar ordinary differential equations, 



d 2 Z 



+ k 2 Z = 0, (7.188) 



dz 



d^r 
dt 2 



+ k 2 v 2 T = 0, (7.189) 



both with sinusoidal solutions, so that their product is a standing wave with a wave vector k and 
frequency a>= kv, which may be presented by Eq. (182). 61 Now using the boundary conditions E(0, t) = 
E(/, t) = 0, 62 we get the eigenvalue spectrum for k p and hence for co p = vk p , given by Eqs. (183) and 
(184). 

Lessons from this simple case study may be readily generalized for an arbitrary resonator: there 
are (at least :-) two methods of finding the eigenfrequency spectrum: 

(i) We may look at a traveling wave solution and find where reflecting mirrors may be inserted 
without affecting the wave's structure. Unfortunately, this method is limited to simple geometries. 



(ii) We may solve the general 3D wave equation, 



V 2 - 1 8 



v 2 dt 2 



/(r,0 = 0, (7.190) 



for field components, as an eigenvalue problem with appropriate boundary conditions. If system 
parameters (and hence coefficient v) do not change in time, the spatial and temporal variables of Eq. 
(185) may be always separated by taking 

/(r,f) = W(f), (7-191) 

where function 7[i) always obeys the same equation (189), having the sinusoidal solution of frequency a> 
= vk. Plugging this solution back into Eq. (190), for the spatial distribution of the field we get the 3D 

Helmholtz equation, 



61 In this form, the equations are valid even in the presence of dispersion, but with the frequency-dependent wave 
speed: v 2 = 1/^cd)/^cd). 

62 This is of course the expression of the first of the general boundary conditions (104). The second if these 
conditions (for the magnetic field) is satisfied automatically for the transverse waves we are considering. 
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(v 2 +£ 2 )?(r) = 0, 



3D 

(7 192) Helmholtz 
^ " ' equation 



whose solution (for non-symmetric geometries) may be much more complex. 



Let us use these methods to find the eigenfrequency spectrum of a few simple, but practically 
important resonators. First of all, the first method is completely sufficient for the analysis of any 
resonator formed as a fragment of a uniform TEM transmission line (e.g., a coaxial cable) between two 
conducting lids perpendicular to the line direction. Indeed, since in such lines k z = k = co/v, and the 
electric field is perpendicular to the propagation axis, e.g., parallel to the lid surface, the boundary 
conditions are exactly the same as in the Fabry-Perot resonator, and we again arrive at the 
eigenfrequency spectrum (184). 

Now let us analyze a slightly more complex system: a rectangular metallic-wall cavity of volume 
axbxl - see Fig. 30. In order to use the first method, let us consider the resonator as a finite-length (Az = 
/) of the rectangular waveguide stretched along axis z, which was analyzed in detail in Sec. 7. As a 
reminder, for a < b, in the basic Hio traveling wave mode, both E and H do not depend on y, with vector 
E having only j-component. On the contrary, vector H has both components H x and H z , with the phase 
shift nil between them, with component H x having the same phase as E y - see Eqs. (131), (137), and 
(138). Hence, if a plane, perpendicular to axis z, is placed so that the electric field vanishes on it, H x also 
vanishes, so that all the boundary conditions (104) pertinent to a perfect metallic wall are fulfilled 
simultaneously. 




Fig. 7.30. Rectangular metallic resonator as a 
finite section of a waveguide with the cross- 
section shown in Fig. 25. 



As a result, the H\q wave would not be perturbed by two metallic walls separated by an integer 
number of half- wavelength XJ2 corresponding to the wave number given by Eqs. (102) and (133): 



* : =(P-*;)" 2 = 



f 2 2\ 

co n 



(7.193) 



Using this expression, we see that the smallest of these distances, / = AJ2 = nlk z , gives resonance 
frequency 63 



n 

\a) 



+ 



K 



1/2 



Basic 
(7.194) mode's 

frequency 



63 In most electrical engineering handbooks, the index corresponding to the shortest side of the resonator is listed 
last, so that the fundamental mode is nominated as Hno and its eigenfrequency as <yiio- 
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with the indices showing the number of half-waves along each dimension of the system. This is the 
lowest (fundamental) eigenfrequency of the resonator (if b<a,l). 

The field distribution in this mode is close to that in the corresponding waveguide mode H\o 
(Fig. 22), with the important difference that phases of the magnetic and electric fields are shifted by 
phase nil both in space and time, just as in the Fabry-Perot resonator - see Eqs. (182) and (185). Such 
time shift allows for a very simple interpretation of the Hm mode that is especially adequate for very 
flat resonators, with b « a, I. At the instant when the electric field reaches maximum (Fig. 31a), i.e. the 
magnetic field vanishes in the whole volume, the surface electric charge of the walls (with density <j = 
EJs) is largest, being localized mostly in the middle of the broadest (in Fig. 31, horizontal) faces of the 
resonator. At later times, the walls start to recharge via surface currents whose density J is largest in the 
side walls, and reaches its maximal value in a quarter period of the oscillation period of frequency <z>ioi - 
see Fig. 31b. The currents generate the vortex magnetic field, with looped field lines in the plane of the 
broadest face. The surface currents continue to flow in this direction until (in one more quarter period) 
the broader walls of the resonator are fully recharged in the polarity opposite to that shown in Fig. 31a. 
After that, the surface currents stat to flow in the direction opposite to that shown in Fig. 31b. This 
process, that repeats again and again, is conceptually similar to the well-known oscillations in a lumped 
LC circuit, with the role of (now, distributed) capacitance played mostly by the broadest faces of the 
resonator, and that of distributed inductance, mostly by its narrower walls. 

(a) (b) 



Fig. 7.31. Fields, charges, and 
currents in the basic H m mode of a 
rectangular metallic resonator, at two 
instants separated by At = x/2a> m - 
schematically. 




In order to generalize result (194) to higher oscillation modes, the second method discussed 
above is more prudent. Separating variables as ^r) = X{x)Y(y)Z(z) in the Helmholtz equation (192), we 

see that X, Y, and Z have to be sinusoidal functions of their arguments, with wave vector components 
satisfying the characteristic equation 

k 2 x +k 2 y +k 2 =k 2 =^. (7.195) 

In contrast to the wave propagation problem, now we are dealing with standing waves along all three 
dimensions, and have to satisfy the boundary conditions on all sets of parallel walls. It is straightforward 
to check that the macroscopic boundary conditions (E T = 0, H„ = 0) are fulfilled at the following field 
component distribution: 

E Y = E, cos A: x sinA: y sin A: z, H r = H, sin A: x cosA: y cos A: z, 

x i x y •> z ' x i x y z ' 

E = E 2 sinA: x x cosk y y sinA: z z, H = H 2 cosk x x sinA^y cosA: z z, (7.196) 
E z = E 3 sinA^x sink y y cos k z z, H z = H 3 cosk x x cosA^y sink z z, 

with each of the wave vector components having the equidistant spectrum similar to the one given by 
Eq. (193): 
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k r = 



nn 



k„ = 



k. = 



n p 



a b I 

so that the full spectrum of eigenfrequencies is given by the following formula, 



(7.197) 



co =vk = v 

nmp 



f \2 

nm N 



\ a ) 



b 



r 7zp^ 
I 



1/2 



(7.198) 



which is a natural generalization of Eq. (194). Note, however, that of 3 integers m, n, and p at least two 
have to be different from zero, in order to keep the fields (196) nonvanishing. 

Let us use Eq. (199) to evaluate the number of different modes in a relatively small region d k 
« k 3 (which is still much larger than the reciprocal volume, \IV = \labl, of the resonator) of the wave 
vector space. Taking into account that each eigenfrequency (198), with nml ^ 0, corresponds to two field 
modes with different polarizations, 64 the argumentation absolutely similar to the one used in the end of 
Sec. 7 for the 2D case yields 




Oscillation 
(7.199) mode 



density 



This property, valid for resonators of arbitrary shape, is broadly used in classical and quantum statistical 
physics, 65 in the following form. If some electromagnetic mode property, /(k), is a smooth function of 
the wave vector, and volume Fis large enough, then Eq. (199) may be used to approximate the sum over 
the modes by an integral: 



£/(k)*{/(k>/iV = j7(k) 



dN 
d 3 k 



d 3 k = 2 



V 



{2x) 



-$f(k)d 3 k 



(7.200) 



Finally, note that low-loss resonators may be also formed by finite-length sections of not only 
metallic waveguides with different cross-sections, but also of the dielectric waveguides. Moreover, even 
the a simple slab of a dielectric material with a /j/s ratio substantially different from that of its 
environment (say, the free space) may be used as a high-g Fabry-Perot interferometer, due to an 
effective wave reflection from its surfaces at normal and especially inclined incidence - see, 
respectively, Eqs. (68) and Eqs. (91) and (95). Actually, such dielectric Fabry-Perot interferometer is 
frequently more convenient for practical purposes than a metallic resonator, due to its finite coupling to 
environment, that enables a natural way of wave insertion and extraction - see Fig. 32. The back side of 
the same medal is that this coupling to environment provides an additional mechanism of power losses, 
limiting the resonance quality - see the next section. 



64 This fact becomes evident from plugging Eq. (196) into the Maxwell equation V-E = 0. The resulting equation, 
k x Ei + kyE 2 + k z E 3 =0, with the discrete, equidistant spectrum (197) for each wave vector component, may be 
satisfied by two linearly independent sets of constants £1,2,3- 

65 See, e.g., QM Sec. 1.1 and SM Sec. 2.6. 
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a * s, 




Fig. 7.32. Dielectric Fabry-Perot interferometer. 



7.10. Energy loss effects 

Inevitable energy losses ("power dissipation") in passive media lead, in two different situations, 
to two different effects. In a long transmission line fed by a constant wave source at one end, the losses 
lead to a gradual attenuation of the wave, i.e. to the decrease of its amplitude, and hence power T 3 , with 
the distance z along the line. In linear materials, the losses are proportional to the wave amplitude 
squared, i.e. to the time- average of the power itself, so that the energy balance on a small segment dz 
takes the form 



Wave 
attenuation 



d7> = 



loss 



dz 



dz = -otPdz . 



Coefficient a, participating in the last form of Eq. (201) and defined by relation 



a 



d~P, I dz 

loss 

-p 



is called the attenuation constant.^ Comparing the evident solution of Eq. (201), 



■p(z) = -P(Q>)e 



-az 



with Eq. (29), where k is replaced with k z , we see that a may expressed as 

a = 2Im£ , 



(7.201) 



(7.202) 



(7.203) 



(7.204) 



where k z is the component of the wave vector along the transmission line. In the most important limit 
when the losses are low in the sense a « \ k z \ « Re k z , its effects on the field distributions along the 
line's cross-section are negligible, making the calculation of a rather straightforward. In particular, in 
this limit the contributions to attenuation from two major sources, energy losses in the filling dielectric, 
and the skin effect in conducting walls, are independent and additive. 

The dielectric losses are especially simple to describe. Indeed, a review of our calculations in 
Sees. 6-8 shows that all of them remain valid if either s(co), or ju(co), or both, and hence k(co) have small 
imaginary parts: 

k" = co\m[s{o))n{co)]' 2 « k'. (7.205) 
In TEM transmission lines, k = k z , and hence Eq. (205) yields 



66 In engineering, attenuation is frequently measured in decibels per meter (acronymed as db/m or just dbm): 

10 

— III \i\n />•"!' '"J — _ 

'10 



a 



-P{z = 1 m) 



lnlO 



a [m _1 ]» 4.34 a [m -1 ]. 
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2k" = 2alm[e(co)ju(o))] 112 , 



Energy 
loss 

(7.206) in filling 
dielectric 



For dielectric waveguides, in particular optical fibers, these losses are the main attenuation mechanism. 
As we already know from Sec. 8, in practical optical fibers K t R » 1, i.e. most of the field propagates (as 
the evanescent wave) in the cladding, and the wave mode is very close to TEM. This is why it is 
sufficient to use Eq. (206) for the cladding material alone. 

In waveguides with non-TEM waves, we can readily use the relations between k z and k derived 
above to re-calculate k" into Im k z . (Note that as such re-calculation, values of k t stay real, because they 
are just the eigenvalues of the Helmholtz equation (101), which does not include k.). 

In waveguides and transmission lines with metallic conductors, much higher energy losses may 
come from the skin effect. Let us calculate them, assuming that we know the field distribution in the 
wave, in particular, the tangential component H of the magnetic field at conductor surface. Then, if the 
wavelength A, is much larger than S s , as it usually is, 67 we may use the results of the quasistationary 
approximation derived in Sec. 6.2, in particular Eqs. (6.27)-(6.28) for the relation between the complex 
amplitudes of the current density in the conductor and the tangential magnetic field 



jucoa 

The power loss density (per unit volume) may be now calculated by time averaging of Eq. (4.39): 



(7.207) 



( \_ \jM 2 _ I*-IXMT _ \ H M 

oss \ X ) ~ » ~ ~ ~ ~2 



2cj 2a 5 S <j 

and its integration along the normal to the surface (through all the skin depth), using the exponential law 
(6.26). This (elementary) integration yields the following power loss per unit area: 68 




(7.209) 

The total power loss df\ QS Jdz per unit length of a waveguide, i.e. the right-hand part of Eq. (201), now 
may be calculated by the integration of the ratio 'PioJA along the contour(s) limiting the cross-section of 

all conductors of the line. Since our calculation is only valid for low losses, we may ignore their effect 
on the field distribution, so that the unperturbed distribution may be used both in Eq. (209), i.e. the 
nominator of Eq. (202), and also for the calculation of the average propagating power, i.e. the 
denominator of Eq. (202), as the integral of the Poynting vector over the cross-section of the waveguide. 

Let us see how this approach works for the TEM mode in one of the simplest TEM transmission 
lines, the coaxial cable (Fig. 19). As we already know from Sec. 6, in the absence of losses, the 
distribution of TEM mode fields is the same as in statics, namely: 

H z =0, H p =0, H (p {p) = H Q -, (7.210) 

P 



Energy 
loss 

in metallic 
walls 



67 As follows from Eq. (78), which may be used for estimates even in cases of arbitrary incidence, this condition 
is necessary for low attenuation: a«k only if F « 1 . 

68 For a normally-incident plane wave, this formula would bring us back to Eq. (78). 
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where Ho is the field's amplitude on the surface of the inner conductor, and 



E z =0, E p (p) = ZH 9 (p) = ZH 0 



P 



E,=0, Z = 



tL 



(7.211) 



Now we can, neglecting losses for now, use Eq. (42) to calculate the time-averaged Poynting vector 



S = 



Z\H(p)\ Z \H n 



r \ 



\P. 



2 2 

and from it, the total power propagating through the cross-section 

pdp 



(7.212) 



= \Sd 2 r = 



ZlHfa 2 h 



, p 



. = 7rZ\H 0 \ 2 a 2 \n — . 
1 '' a 



(7.213) 



For the particular case of the coaxial cable (Fig. 19), the contours limiting the wall cross-sections 
are circles of radii p = a (where the surface field amplitude HJO) equals, in our notation, Ho), and p = b 
(where, according to Eq. (204), the field is a factor of bla lower). As a result, for the power loss per unit 
length, Eq. (209) yields 



d-P t 



loss 



dz 



2m\H n + 2nb 



H a 



2^ 



= —a 
2 



1 + - 



juo)S\H 0 \ 



(7.214) 



Note that at a « b, the losses in the inner conductor dominate, despite its smaller surface, because of 
the higher surface field. Now we may plug Eqs. (213)-(214) into the definition (202) of a, to calculate 
the part of the attenuation constant associated with the skin effect: 



^skin — 



1 



2\n(bla) 



1 1 | juo)S s 
a b 



kS. 



2\n(bla) 



— + — 
{a b) 



(7.215) 



We see that the relative (dimensionless) attenuation, alk, scales approximately as the ratio <5 s /min[a, b]. 
This result is should be compared with Eq. (78) for the normal incidence of plane waves on a conducting 
surface. 

Let us evaluate a for the standard TV cable RG-6/U (with copper conductors of diameters 2a = 
1 mm, 2b = 4.7 mm, and s~ 2.2 so, ju~ jUq). According to Eq. (6.27a), for/= 100 MHz {co~ 6.3x 10 8 s" 1 ) 
the skin depth of pure copper at room temperature (with a « 6.0xl0 7 S/m) is close to 6.5xl0" 6 m, while 

1/21/21 1 

k = cd^sjJ) = (s/so) (ft/c) « 3.1 m" . As a result, the attenuation is rather low: a s ki n ~ 0.016 m" , so that 
the attenuation length scale / =\la is about 60 m. Hence the attenuation in a cable connecting a roof 
TV antenna to a TV set in the same house is not a big problem, though using a worse conductor, e.g., 
steel, would make the losses rather noticeable. (Hence the current worldwide shortage of copper.) 
However, an attempt to use the same cable in the X-band if - 10 GHz) is more problematic. Indeed, 
though the skin depth S s cc of decreases with frequency, the wave length drops, i.e. k increases, even 
faster (k cc <z>), so that the attenuation a s ki n °c co m becomes close to 0.16 m, and / to ~6 m. This is why 
at such frequencies, it is more customary to use rectangular waveguides, with their larger internal 
dimensions a, b ~ Ilk, and hence lower attenuation. Let me leave the calculation of this attenuation, 
using Eq. (209) and the results derived in Sec. 9, for reader's exercise. 
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The power loss effect on free oscillations in resonators is different: there it leads to a gradual 
decay of oscillation energy 3 in time. The useful measure of this decay, called the Q factor, may be 
introduced by writing the temporal analog of Eq. (201): 



d£ = --P Xo Jt = 



CO 

Q 



£dt, 



(7.216) 



where co in the eigenfrequency in the loss-free limit, and the dimensional Q factor is defined by a 
relation parallel to Eq. (202): 69 



CO 


loss 


Q = 


3 



The solution to Eq. (216), 



co co I 2k 2ti 



(1 217) Q" factors 

definition 



Oscillation 

(7.218) e H ner gy 

v ' decay 



which is an evident temporal analog of Eq. (203), shows the physical meaning of the Q factor: the 
characteristic time r of the oscillation energy decay is {QIItz) times longer than the oscillation period T 
= Inlco. (Another interpretation of Q comes from the relation 70 



Q 



CO 

Aco 



FWHM 
(7.219) bandwidth 



where Aco is the so-called FWHM 71 bandwidth of the resonance, namely the difference between the two 
values of the external signal frequency, one above and one below co, at which the energy of forced 
oscillations induced in the resonator by an input signal is twice lower than its resonant value.) 

In the important particular case of resonators formed by insertion of metallic walls into a TEM 
transmission line of small cross-section (with the linear size scale a much less than the wavelength A), 
there is no need to calculate the Q factor directly if the line attenuation coefficient a is already known. 
In fact, as was discussed in Sec. 9 above, the standing waves in such a resonator, of the length given by 
Eq. (183): / = p(A/2) with p = 1,2,..., may be understood as an overlap of two TEM waves running in 
opposite directions, or in other words, a traveling wave and its reflection from one of the ends, the 
whole roundtrip taking time A? = 2llv = pAJv = 2np/co = pT. According to Eq. (201), at this distance the 
wave's power should drop by exp{-2a/} = exp{-paA,}. On the other hand, the same decay may be 
viewed as happening in time, and according to Eq. (216), result in the drop by exp{-A^/r} = exp{- 
(pT)l(QI co)} = exp{-27ip/Q}. Comparing these two exponents, we get 



Q = 



2n _k 
aX a 



(7.220) Qvs. 



This simple relation neglects the losses at wave reflection from the walls limiting the resonator 
length. Such approximation is indeed legitimate at a « A; if this relation is violated, or if we are dealing 



69 As losses grow, the oscillation waveform deviates from sinusoidal one, and the very notion of "oscillation 
frequency" becomes vague. As a result, parameter Q is well defined only if it is much higher than 1. 

70 See, e.g., CM Sec. 4.1. 

71 This is the acronym of "Full Width at Half-Maximum". 
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with more complex resonator modes (such as those based on the reflection of E or H waves), the Q 
factor may be smaller than that given by Eq. (220), and needs to be calculated directly. A substantial 
relief for such a direct calculation is that, just at the calculation of small attenuation in waveguides, in 
the low-loss limit (Q » 1), both the nominator and denominator of the right-hand part of Eq. (217) may 
be calculated neglecting the effects of the power loss on the field distribution in the resonator. I am 
leaving such a calculation, for the simplest (rectangular and circular) resonators, for reader's exercise. 

To conclude this chapter, the last remark: in some resonators (including certain dielectric 
resonators and metallic resonators with holes in their walls), additional losses due to wave radiation into 
the environment are also possible. In some simple cases (say, the Fabry -Perot interferometer shown in 
Fig. 32) the calculation of these radiative losses is straightforward, but sometimes it requires more 
elaborated approaches, which will be discussed in the next chapter. 

7.11. Exercise problems 

7.1 . Find the temporal Green's function of a medium whose complex dielectric constant obeys 
Eq. (32), using: 

(i) the Fourier transform, and 

(ii) the direct solution of Eq. (30), which describes the corresponding model of the medium. 
Hint: For the Fourier transform, you may like to use the Cauchy integral. 72 

7.2 . A monochromatic, plane electromagnetic wave is normally incident from free space on a 
uniform slab of a material with electric permittivity s and magnetic permeability ju, with the slab 
thickness d comparable with the wavelength. 

(i) Calculate the power transmission coefficient f, i.e. the fraction of the incident power, that is 
transmitted through the slab. 

(ii) Assuming that s and ju are frequency-independent and positive, analyze in detail the 
frequency dependence of /T In particular, how does function 7\co) depend on the film thickness d and 

1 /? 

the wave impedance Z = (ju/e) of its material? 

7.3 . Calculate, sketch and discuss the dispersion relation for electromagnetic waves propagating 
in an oscillator medium described by Eq. (32), for the case of negligible damping. 

7.4 . Analyze the possibility of propagation of surface electromagnetic waves along a plane 
boundary between plasma and free space. In particular, calculate and analyze the dispersion relation of 
the waves. 

Hint: Assume that the magnetic field of the wave is parallel to the boundary and perpendicular to 
the wave propagation direction. (After solving the problem, justify this mode choice.) 



72 See, e.g., MA Eq. (15.2). 
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7.5 . Calculate the characteristic impedance Zw of the long, straight TEM transmission lines 
formed by metallic electrodes with cross-sections shown in Fig. below: 

(i) two round, parallel wires, separated by distance d » R, 

(ii) microstrip line of width w » d, 

(iii) stripline with w » d\ ~ di, 

in all cases using the macroscopic boundary conditions on metallic surfaces. Assume that the conductors 
are embedded into a linear dielectric with constant a and /u. 




7.6 . Modify results of Problem 5 (ii) for a superconductor microstrip line, taking into account the 
magnetic field penetration into both the strip and the ground plane. 



7.7 . What lumped ac circuit would be equivalent to the system shown in Fig. 20, with incident 
wave's power 'Pfl Assume that the wave reflected from the load circuit does not return to it. 



7.8 . Find the lumped ac circuit equivalent to a loss-free TEM «- | 
transmission line of length / ~ A, with a small cross-section area A ^ 

« A 2 , as "seen" (measured) from one end, if the line's conductors < 

are galvanically connected ("shortened") at the other end - see Fig. I ~ A 

on the right. Discuss the result's dependence on the signal frequency. 



« A 



7.9 . Represent the fundamental H 0 \ wave in a rectangular waveguide (Fig. 22) with a sum of two 
plane waves, and discuss why such presentation is possible. 



7.10 . For a metallic coaxial cable with the circular cross-section (Fig. 21), find the lowest non- 
TEM mode and calculate its cutoff frequency. 



7.11 . Use the recipe outlined in Sec. 7 to derive the characteristic equation (161) for the HE and 
EH modes in a round, step-index optical fiber. 
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7.12 . Find the lowest eigenfrequencies, and corresponding 



oscillation modes, of a round cylindrical resonator (see Fig. on the right) 
with perfectly conducting walls. 




7.13 . Calculate the skin-effect contribution to the attenuation coefficient a, defined by equation 
(202), for the basic (i/io) mode propagating in a waveguide with the rectangular cross-section - see Fig. 
22. Use the results to evaluate a and L for the standard X-band waveguide WR-90 (with copper walls, 
a = 23 mm, b = 10 mm, and no dielectric filling) that carries a 10 GHz wave, at room temperature. 
Compare the estimate with that for a standard coaxial cable, at the same frequency, using calculations 
carried out in Sec. 10. 



7.14 . Calculate the skin-effect contribution to the attenuation coefficient a of 

(i) the basic (H\\) mode, and 

(ii) the i/ 0 i mode 

in a metallic waveguide with the circular cross-section (Fig. 23a), and analyze the low-frequency (a> 
-^co c ) and high-frequency (a>» co c ) behaviors of a for each of these modes. 



7.15 . For a rectangular resonator with dimensions axbxl (b < a, I), calculate the g-factor in the 
fundamental (lowest) oscillation mode, due to the skin-effect losses in metallic walls. Evaluate the factor 
(and the lowest eigenfrequency) for a 23x23x10 mm resonator with copper walls, at room temperature. 



z 



7.16 . Calculate the lowest eigenfrequency and Q factor (due to the 
skin-effect losses) of the toroidal (axially-symmetric) resonator with 
metallic walls and interior's cross-section shown in Fig. on the right, 
within the limit d « r, R. 




7.17 . For the dielectric Fabry-Perot resonator (shown in Fig. 32) with the normal wave 
incidence, find the g-factor due to radiation losses in the limit of strong impedance mismatch (Z » Zq), 
using two methods: 

(i) from the energy balance, using Eq. (217), and 

(ii) from the frequency dependence of the power transmission coefficient, using Eq. (219). 
Compare the results. 
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Chapter 8. Radiation, Scattering, Interference, and Diffraction 

This chapter continues the discussion of the electromagnetic wave propagation, now focusing on the 
results of wave incidence on a passive object. Depending on the object's shape, the result of this 
interaction is called either scattering, or diffraction, or interference. However, as we will see below, the 
boundary between these effects is blurry, and their mathematical description may be conveniently based 
on a single key calculation - the electric dipole radiation of a spherical wave by a small source. 
Naturally, I will start the chapter from this calculation, deriving it from an even more general result - 
the "retarded potentials" solution of the Maxwell equations. 



8.1. Retarded potentials 

Let us start from the general solution of the Maxwell equations in a dispersion-free, linear, 
uniform, isotropic medium, characterized by frequency-independent, real e and ju - for example, free 
space. 1 The easiest way to perform this calculation is to use the scalar ((/>) and vector (A) potentials of 
electromagnetic field, that are defined via the electric and magnetic fields by Eqs. (6.106): 

d\ 

E = -VS , B = VxA. (8.1) 

dt 

As was discussed in Chapter 6, imposing upon the potentials the Lorenz gauge condition (6.108), 

V Ot SjU 



V-A + — ^ = 0, v 2 = — , (8.2) 



(which does not affect fields E and B) the macroscopic Maxwell equations for the fields may be recast 
into a pair of very similar, simple equations (6.109) for the potentials: 



v dt s 



V 2 A-\^ = - M \. (8.3b) 

v dt 

Let us calculate the fields induced by the stand-alone electric charge and current densities p(r, t) 
and j(r, t), thinking of them as known functions. 2 The idea how this may be done may be borrowed from 
electro- and magnetostatics. Indeed, for the stationary case (dldt = 0), the solutions of Eqs. (8.3) are 
given, by the evident generalization of, respectively, Eq. (1.38) and by Eq. (5.28) to the uniform, linear 
medium: 



47TS J \r —r I 



1 When necessary (e.g., at the discussion of the Cherenkov radiation in Sec. 10.4), it will be not too hard to 
generalize these results to dispersive media. 

2 Such thinking would not prevent the results from being valid for the case when p(r, t) and j(r, f) should be 
calculated self-consistently. 
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A(r) = £jj(r') 



d 3 r' 



47T' 



(8.4b) 



As we know, these expressions may be derived by, first, calculating the potential of a point source, and 
then using the linear superposition principle for a system of such sources. 

Let us do the same for the time-dependent case, starting from the field induced by a time- 
dependent point charge at origin: 3 



p(r,t) = q(t)S(r), 
In this case Eq. (3 a) is homogeneous everywhere but the origin: 

v 2 dt 2 



(8.5) 



(8.6) 



Due to the spherical symmetry of the problem, it is natural to look for a spherically-symmetric solution 
to this equation. 4 Thus, we may simplify the Laplace operator 5 correspondingly, and reduce Eq. (6) to 



J_8_ 
r 2 Br 



8r v 2 dt 



0 = 0, at r*0. 



(8.7) 



If we now introduce a new variable %= r <fi > Eq. (7) is reduced to the ID wave equation 



f d 2 



i d 



2 \ 



K dr 2 v 2 dt 2 j 



X = 0, at r * 0 . 



(8.8) 



From the discussion in Chapter 7, 6 we know that its general solution may be presented as 

Z(r,t) = Z 0 





f r \ 




f r \ 


X out 


t-- 


+ x- m 


t + - 








v v) 



(8.9) 



where X' m and Jout are (so far) arbitrary functions of one variable. The physical sense of <fi 0 ut = XoJ r is a 
spherical wave propagating from our source (at r = 0) to outer space, i.e. exactly the solution we are 
looking for. On the other hand, <fr m = X'J r describes a spherical wave that could be created by some 
distant spherically-symmetric source, that converges on our charge located at the origin - evidently not 
the effect we want to consider here. Discarding this term, and returning to </> = j/r , we can write the 
solution (7) as 



t — 



V 



vj 



(8.10) 



3 Admittedly, this expression does not satisfy the continuity equation (4.5), but we will correct this deficiency 
imminently, at the linear superposition stage - see Eq. (17) below. 

4 Let me emphasize that this is not the general solution to Eq. (6). For example, it does nor describe the fields 
created by other sources, that pass by the considered charge q(t). However, such fields are irrelevant for our 
current task: to calculate the field created by the charge q{t) itself. 

5 See, e.g., MAEq. (10.9). 

6 See also CM Sec. 5.3. 
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In order to find function % out , let us consider distances r so small that the time derivative in Eq. 
(3a), with the right-hand part (5), 



v dt 



is much smaller that the spatial derivative (that diverges at r — > 0) . Then Eq. (11) is reduced to the 
electrostatic equation whose solution (4a), for source (5), is 

0(r^O,t) = ^-. (8.12) 
4ns r 

Now requiring the two solutions, (10) and (12), to coincide at r « vt, we get % 0Vl t(f) = q(t)/4nsr, so that 
Eq. (10) becomes 

1 f A 

(j>{r,t) = q t . (8.13) 



4ns r 



V VJ 



Just as had been done in statics, this result may be readily generalized for the arbitrary position 
r' of the point charge: 

p(r, t) = q(t)S(r -r') = q(t)S(R) , (8.14) 

where R is the distance between the field observation point r and the source position point r', i.e. the 
length of the vector, 

R = r-r', (8.15) 

connecting these points - see Fig. 1 . 




Obviously, Eq. (13) becomes 



(/>{r,t) = 



1 



4ns R 



_R 

\ vJ 



(8.16) 



Retarded 
scalar 
potential 



Now we can use the linear superposition principle to write, for the arbitrary charge distribution p(r, t), 

(8.17a) 




where integration is extended over all charges of the system under analysis. Acting absolutely similarly, 
for the vector potential we get 
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Retarded 
(8.17b) vector 
potential 



(Now nothing prevents functions p(r, t) and j(r, t) from satisfying the continuity relation.) 

Solutions (17) are called the retarded potentials, the name signifying that the observed fields are 
"retarded" (delayed) in time by At = R/v relative to the source variations, due to the finite speed v of the 
electromagnetic wave propagation. These solutions are so important that they deserve at least a couple 
of general remarks. 

First, remarkably, these simple expressions are exact solutions of the Maxwell equations (93) in 
a uniform medium for an arbitrary distribution of stand-alone charges and currents. They also may be 
considered as the general solutions of these equations, provided that the integration is extended over all 
field sources in the Universe - or at least in its part that affects our observations. 

Second, if functions p{r, t) and j(r, t) include the microscopic (bound) charges and currents as 
well, the macroscopic Maxwell equations (6.93) are valid with the replacement s — > so and /u — > //o, so 
that the retarded potentials solutions (17) are also valid - with the same replacement. 

Finally, Eqs. (17) may be plugged into Eqs. (1), giving (after an explicit differentiation) the so- 
called Jefimenko equations for fields E and B - similar in structure to Eqs. (17), but more cumbersome. 
Conceptually, the existence of such equations is a good news, because they are free from the gauge 
ambiguity pertinent to potentials <fi and A. However, the practical value of these explicit expressions for 
the fields is not too high: for all applications I am aware of, it is easier to use Eqs. (17) to calculate the 
particular expressions for the potentials first, and only then calculate the fields from Eqs. (1). Let me 
present the (apparently most important) example of this approach. 



8.2. Electric dipole radiation 

Consider again the problem that was discussed in electrostatics (Sec. 3.1), namely the field of a 
localized source with linear dimensions a « r (Fig. 1), but now with time-dependent charge and/or 
current distribution. Using the arguments of that discussion, in particular the condition expressed by Eq. 
(3.1), r' « r, we may apply the Taylor expansion (3.3), 

/(R) = /(r)-r'-V/(r) + ..., (8.18) 

to function /(R) = R (for which V/(r) = Vi? = n, where n = r/r is the unit vector directed toward the 
observation point, see Fig. 1) to approximate distance R as 

R»r-r'n. (8.19) 

In each of the retarded potential formulas (17), R participates in two places: in the denominator 
and in the source time argument. If p and j change in time on scale ~\lco, where a> is some characteristic 
frequency, then any change of argument (t - R/v) on that time scale, for example due to a change of R on 
the spatial scale ~vla> = Ilk, may substantially change these functions. Thus, expansion (18) may be 
applied to R in the argument (t - R/v) only if ka « 1, i.e. if the system size a is much smaller than the 
radiation wavelength A = Inlk. On the other hand, function \IR changes relatively slowly, and for it even 
the first term expansion (19) gives a good approximation as soon as a « r, R. In this approach, Eq. 
(17a) yields 
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0(r,f)«— \p\r',t 
47TS r J 1 



*\/3 > 1 
— a r = 

v J 



Ansr 



Q 



R 

v J 



(8.20) 



where Q(t) is the net electric charge of the localized system Due to the charge conservation, this charge 
cannot change with time, so that the approximation (20) describes gives just a static Coulomb field of 
our localized source, rather than a radiated wave. 



Let us, however, apply a similar approximation to the vector potential (17b): 



A(r,t) 



Anr 



Jjk, 



d 3 r'. 



v ) 



(8.21) 



According to Eq. (5.87), in statics the right-hand part of this expression would vanish, but in dynamics 
this is no longer true. For example, if the current is due to a nonrelativistic motion 7 of a system of 
charges qu, we can write 



(8.22) 



where p(f) is the dipole moment of the localized system, defined by Eq. (3.6). Now, after the integration, 
we may keep only the first term of approximation (19) in the argument (t - Rlv) as well, getting 



A(r,0 



(8.23) 



Far 
zone 
field 



Let us analyze what exactly does this result, valid in the limit ka «1, describe. The second of 
Eqs. (1) allows us to calculate the magnetic field by the spatial differentiation of A. At large distances r 
» X (i.e. in the so-called far field zone), where Eq. (23) describes a virtually plane wave, the main 
contribution into this derivative is given by the dipole moment factor: 



(8.24) 



This expression means that the magnetic field, at the observation point, is perpendicular to vectors n and 
(the retarded value of) p , and its magnitude is 




B = 



Aftrv 



vj 



sin^, 



i.e. H = 



1 



Anrv 



(8.25) 



where 0 is the angle between those two vectors - see Fig. 2. 8 



7 For relativistic particles, moving with velocities of the order of speed of light, one has to be more careful. As the 
result, I will postpone the discussion of their radiation until Chapter 10, i.e. until after the discussion of special 
relativity in Chapter 9. 

8 From the first of Eqs. (1), for the electric field, in the first approximation (23), we would get -dA/dt = -{XIAnsvf) 
p (f - rlv) = -(Z/4m-)\) (t - rlv). The transversal component of this vector (see Fig. 2) is the proper wave field E = 

ZHxn, while its longitudinal component is exactly compensated by (-V$ in the next term of expansion of Eq. 
(17a) with respect to small parameter r/A « 1. 
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The most important feature of this result is that the time-dependent field decreases very slowly 
(only as 1/r) with the distance from the source, so that the radial component of the corresponding 

2 2 

Poynting vector (7.7), S r = ZH , drops as 1/r , i.e. the full power *Pof the emitted spherical wave, that 
scales as r 2 S r , does not depend on the distance from the source - as it should for radiation. Equation (25) 
allows us to be more quantitative; for the instantaneous radiation intensity we may plug it into Eq. (7.9) 
to get 



5. =ZH 2 = 




(8.26) 



Instant 
power 
density 




Fig. 8.2. Far zone fields of a localized source, 
contributing into its electric dipole radiation. 



This is the famous formula for the electric dipole radiation; this is the dominating component of 
radiation by a localized system of charges - unless p = 0. Please notice its angular dependence: the 

radiation vanishes at the axis of the retarded vector p (where 6 = 0), and reaches its maximum in the 
plane perpendicular to that axis. Integration of S r over all directions, i.e. over the whole sphere of radius 
r, gives the total instant power of the dipole radiation: 9 



§S r d" 



r=const 



{Anv)' 



'p 2 2a$sm 3 0d0 



(8.27) 



Full 

instant 
power 



In order to find the average power, this expression has to be averaged over a sufficiently long 
time. In particular, if the source is monochromatic, p(?) = Re[p fI £xp{-z6>f}], with time-independent 
vector p^ such averaging may be carried out just over one period, giving an extra factor 2 in the 
denominator: 



(8.28) 



The easiest example of application of the formula is to a point charge oscillating, with frequency 
go, along a straight line (that we may take for axis z), with amplitude a. In this case, p = qn z z(t) = qa Re 
[Qxp{-icot}], and if the charge velocity amplitude, aoo, is much less than the wave speed v, we may use 
Eq. (28) with p (0 = qa, giving 




Full 

average 
power 



9 In the Gaussian units, for free space (v = c), this important formula reads "P = (2/3c 3 )p 2 . It was first derived 
in 1897 by J. Larmor for the particular case of a single point charge q moving with acceleration r , when jj i = qr 
and hence T 3 = (2q 2 1 3c 3 ) f 2 . As a result, Eq. (27) is sometimes referred to as the Larmor formula. 
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^2 2 4 

Zq a co 
\2nv 2 



(8.29) 



Applied to an electron (q = -e « -1.6xl0" 19 C), rotating about a nuclei at an atomic distance a ~ 10" 10 m, 
the Larmor formula shows 10 that the energy loss due to the dipole radiation is so large that it would 
cause electron's collapse on atom's nuclei in just ~10" 10 s. In the beginning of the 1900s, this classical 
result was one of the main arguments for the development of quantum mechanics that prevents such 
collapse of electrons in their lowest-energy (ground) state. 

Another example of a very useful application of Eq. (28) is the radio wave radiation by a short, 
straight, symmetric antenna which is fed, for example, by a TEM transmission line such as a coaxial 
cable - see Fig. 3. 



Z A 



,(0) 

55d 



- + 1/2 



- 0 



- -1/2 



Fig. 8.3. Dipole antenna. 



The exact solution of this problem is rather complex, even in the limit / « X, because the law 
Idz) of the current variation along antenna's length should be calculated self-consistently with the 
distribution of the electromagnetic field that is induced by the current in the space around the antenna. 
However, one may argue that that the current should be largest in the feeding point (in Fig. 3, taken for z 
= 0), vanish at antenna's ends (z = +1/2), and that the only possible scale of the current variation in the 
antenna of length / « X is / itself, so that probably the linear function, 



/.(z) = /.(0) 



\ 2 I ' 
1 — \z 



V 



/ 



(8.30) 



j 



gives a reasonable approximation - as it indeed does. Now we can use the continuity equation dQ/dt = I, 
i.e. -i(oQ m = I a), to calculate the complex amplitude Qdz) = Ha(z)sga(z)/co of the electric charge Q(z, t) = 
Re[Qa£xp{-icot}] of the wire beyond point z, and from it, the amplitude of the linear density of charge 



dQ m {z) ..27.(0) 



= —i 



d\z\ col 



sgnz 



(8.31) 



From here, the dipole moment's amplitude is 



Pm =2\X a (z)zdz = -i I ^l 
J 2co 



(8.32) 



10 Actually, the formula needs a numerical coefficient adjustment to account for electron's orbital (rather than 
linear) motion - the task left for reader's exercise. However, this adjustment does not affect the order-of- 
magnitude estimate given above. 
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so that Eq. (28) yields 

' MQ)| 2 /2 _Z(kl) 2 \i (0 (Qf 

\2nv l 4co 2 24n 



^ = z ~ _ i ^ >\ n = i z___a f ( 8 .33) 



where k = colv. The analogy between this result and the dissipation power, 7 3 = ReZ {I m 12), in a lumped 
linear circuit element, allows the interpretation of the first fraction in the last form of Eq. (33) as the real 
part of antenna's impedance: 

ReZ A =Z^, (8.34) 
24n 

as felt by the transmission line. (Indeed, according to Eq. (7.118), the wave traveling along the line 
toward the antenna is fully radiated, i.e. not reflected back, only if Za equals to Zw of the line.) As we 
know from Chapter 7, for typical TEM lines, Z w ~ Z 0 , while Eq. (34), that is only valid in the limit kl « 
1 , shows that for radiation into free space (Z = Z 0 ), ReZ A is much less than Zo. 

Hence in order to reach the impedance matching condition Zw = Za, antenna's length should be 
increased - as a more involved theory shows, to / ~ A/2. However, in many cases, practical 
considerations make short antennas necessary. The most frequently met example met nowadays are the 
cell phone antennas, which use frequencies close to 1 or 2 GHz, with free-space wavelengths A between 
15 and 30 cm, i.e. much larger than the phone size. The quadratic dependence of antenna's efficiency on 
/, following from Eq. (34), explains why every millimeter counts in the design of such antennas, and 
why the designs are carefully optimized using software packages for (virtually exact) numerical solution 
of time-dependent Maxwell equations for the specific shape of the antenna and other phone parts. 11 

To conclude this section, let me note that if the wave source is not monochromatic, so that p(f) 
should presented as a Fourier series, 

p(0 = ReXP^' Vur , (8.35) 

the terms corresponding to interference of spectral components with different frequencies co are 
averaged out at the time averaging of the Poynting vector, so that the average radiated power is just a 
sum of contributions (28) from all substantial frequency components. 



8.3. Wave scattering 

The formalism described above may be immediately used in the theory of scattering - the 
phenomenon illustrated by Fig. 4. Generally, scattering is a complex problem. However, in many cases 
it allows the so-called Born approximation, 12 in which scattered wave is assumed to be much weaker 
than the incident one, and is neglected. 



1 1 A partial list of popular software packages of this kind includes both publicly available codes such as NEC -2 
(whose various versions are available online, e.g., at http://alioth.debian.org/proiects/necpp/ and 
http://www.qsl.net/4nec2/ ), and proprietary packages - such as Momentum from Aglient Technologies (now 
owned by Hewlett-Packard), FEKO from EM Software & Systems, and XFdtd from Remcom. 

12 Named after M. Born (1882-1970), one of the founding fathers of quantum mechanics, credited especially for 
its probability interpretation and matrix formulation. 
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incident 
wave 




;. 8.4. Scattering (schematically). 



Full 
cross- 
section: 
definition 



As the first example of this approach, let us consider scattering of a plane wave, propagating in 
free space (Z = Zq, v = c), by a free 13 charged particle whose motion may be described by nonrelativistic 
classical mechanics. (This requires, in particular, the incident wave to be of a modest intensity, so that 
the speed of the induced charge motion is much less than the speed of light.) In this case the magnetic 
component of the Lorentz force (5.8), 

¥ m =qrxB, (8.36) 
exerted on the charge by the magnetic field of a plane wave, is much smaller than force F e = gE exerted 

1/2 

by its electric field. Indeed, according to Eq. (7.8), H = E/Z = EI(jlus) , B = juH =E/v, so that the ratio 
FJF e equals to the ratio of particle'e speed, I r I , to wave's speed v ~ c. 

Thus, assuming that the incident wave is linearly-polarized along axis x, the equation of 
particle's motion in the Born approximation is just m'x = qE(t), so that for the ^-component p x = qx of its 
dipole moment we can write 

2 

p = qx = ^—E(t). (8.37) 
m 

As we already know from Sec. 2, oscillations of the dipole moment lead to radiation of a wave with a 
wide angular distribution of intensity; in our case this is the scattered wave - see Fig. 4. Its full power 
may be found by plugging Eq. (37) into Eq. (27): 

'? = -^P 1 =-^^E\t), i.e.^ = -^ T |^| 2 . (8.38) 
one one m Yin c m 

Since the power is proportional to incident wave's intensity S, it is customary to characterize 
scattering ability of the object by the ratio, 

(8.39) 

which evidently has the dimension of area and is called the full cross-section of scattering. For this 
measure, Eq. (38) yields the famous result 




3 As Eq. (7.30) shows, this calculation is also valid for an oscillator with eigenfrequency (Oq « a>. 
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7 2 4 2 4 

ff = ^f_ = iVL_ (8 .40) 
okc m bum 

which is called the Thomson scattering formula}* especially when applied to an electron. This relation 
is most frequently presented in the form 15 




(8.41) 

Constant r c is called the classical radius of the particle (or sometimes the "Thomson scattering length"); 
for electron (q = -e, m = m e ) it is close to 2.82xl0" 15 m. Its possible interpretation is evident from the 
first form of Eq. (41) for r c : at that distance between two similar particles, the potential energy q lAnsor 
of their electrostatic interaction is equal to particle's rest-mass energy mc . 16 

Now we have to go back and establish the conditions at which the Born approximation, when the 
field of the scattered wave is negligible, is indeed valid for a point-object scattering. Since the scattered 
wave's intensity, described by Eq. (26), diverges as 1/r 2 , according to the definition (39) of the cross- 
section, it may become comparable to ^incident at r 2 ~ o. However, Eq. (38) itself is only valid if r » X, 
so that the Born approximation does not lead to any contradiction if 

a«X 2 . (8.42) 

For the Thompson scattering by an electron, this condition means X » r c ~ 3x 10" 15 m and is fulfilled for 
all frequencies up to very hard y rays with energies -100 MeV. 

Possibly the most notable feature of result (40) is its independence of the wave frequency. As it 
follows from its derivation, particularly from Eq. (37), this independence is intimately related with the 
unbound character of charge motion. For bound charges, say for electrons in a gas molecule, this result 
is only valid if the wave frequency co is much higher all eigenfrequencies C0j of molecular resonances. In 
the opposite limit, co « a>j, the result is dramatically different. Indeed, in this limit we can approximate 
the molecule's dipole moment by its static value (3.39) 

p = 4^ 0 a mol E. (8.43) 

In the Born approximation, and in the absence of the molecular field effects discussed in Sec. 3.5, E in 
this expression is just the incident wave's field, and we can use Eq. (28) to calculate the power of the 
wave scattered by a single molecule: 



Thomson 
scattering 
formula 



14 Named after Sir J. J. Thomson (1856-1940), the discoverer of the electron - and isotopes as well! He is not to 
be confused with his son, G. P. Thomson, who discovered (simultaneously with C. Davisson and L. Germer) 
quantum-mechanical wave properties of the same electron. 

15 In the Gaussian units, this formula looks like r c = q 2 lmc 2 (giving, of course, the same numerical value: for the 
electron, r c « 2.82x1 0" 13 cm). This classical quantity should not be confused with particle's Compton wavelength 
X c = h/mc (for the electron, close to 2.24x1 0" 12 cm) that naturally arises in quantum electrodynamics - see the next 
chapter. 

16 It is fascinating how smartly has the relativistic expression mc 2 sneaked into the result (40), which was obtained 
using a nonrelativistic equation of particle motion. This was possible because the calculation engaged 
electromagnetic waves that propagate with the speed of light, and whose quanta {photons), as a result, may be 
frequently treated as relativistic (moreover, ultrarelativistic) particles - see the next chapter. 
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- = 4^ 2 (844) 

c 

Now, using the last form of definition (39) of the cross-section, we get a very simple result, 



3c 



Rayleigh 
scattering 
formula 



showing that in contrast to Eq. (40), at low frequencies a grows as fast as co 4 . 

Now let us explore the effect of such Rayleigh scattering 17 on wave propagation in a gas, with 
relatively low density n\. We can expect (and will prove in the next section) that due to the randomness 
of molecule positions, the waves scattered by each molecules may be treated as incoherent, so that the 
total scattering power may be calculated just as the sum of those scattered by each molecule. We can use 
this additivity to write the balance of the incident's wave intensity on a small volume dV of length 
(along the incident wave direction) dz, and area A in across it. Since such a segment includes ndV = 
nAdz molecules, and, according to definition (39), each of them scatters power So = /%/A, the total 
scattered power is n /%dz; hence the incident power's change is 

d-P = -ncfPdz. (8.46) 

Comparing this equation with the general definition (7.202) of the attenuation constant, we see that 
scattering gives the following contribution to attenuation: a= no. From here, using Eq. (3.41) to write 
«moi = (e r - l)/4^n, and Eq. (45), we get 

7,4 ~ | 

(8.47) 

This is the famous Rayleigh scattering formula, which in particular explains the colors of blue 
sky and red sunsets. Indeed, through the visible light spectrum, co changes almost two-fold; as a result, 
scattering of blue components of sunlight is an order of magnitude higher than that of its red 
components. More qualitatively, for air near the Earth surface, s r - 1 » 6xl0" 4 , and n ~ 2.5x10 5 m" - see 
Sec. 3.3. Plugging these numbers into Eq. (47), we see that the characteristic length / = \la of 
scattering is -30 km for blue light and -200 km for red light. 18 The Earth atmosphere is thinner (h ~ 10 
km), so that the Sun looks just a bit yellowish during most of the day. However, elementary geometry 

1/2 

shows that on sunset, the light should pass length / ~ {R^h) « 300 km to reach an Earth-surface 
observer; as a result, the blue components of Sun's light spectrum are almost completely scattered out, 
and even the red components are weakened considerably. 

To conclude the discussion of Eq. (47), let me note that its comparison with the condition of the 
direct applicability of the Born approximation for a distributed object of size a: 

aa«\, (8.48) 




17 Named after Lord Rayleigh (born J. Stuff, 1842-1919), whose numerous contributions to science include the 
discovery of argon. He has also pioneered (for the special case we are considering now) the basic idea of what we 
now call the Born approximation. 

18 These values are approximate because both n and (s r - 1) vary through the atmosphere. 
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implies, in particular, that if the electric polarizability of the material is small, £ r —> 1, we may be able to 
use the approximation for an analysis of scattering by even relatively large objects, with size of the order 
of, or even larger than X. However, for such extended objects, the phase difference factors (neglected 
above) step in, leading in particular to the important effects of interference and diffraction, to whose 
discussion we now proceed. 



8.4. Interference and diffraction 

These effects show up not as much in the total power of scattered radiation, as in its angular 
distribution. It is traditional to characterize this distribution by the differential cross-section defined as 



do S r r 



2 



^incident 



(8.49) 



where r is the distance from the scatterer, at which the scattered wave is observed. Both the definition 
and notation become more clear if we notice that according to Eq. (26), at large distances (r » a), the 
nominator in the right-hand part of Eq. (49), and hence the differential cross-section as the whole, does 
not depend on r, and that its integral over the total solid angle Q = An coincides with the total cross- 
section defined by Eq. (39): 

£^#2 = =^r 2 §S r dn= 1 {Y r d 2 r = ^= = cj. (8.50) 



Differential 
cross- 
section: 
definition 



4i u " ^incident Atz °i 

For example, according to Eq. (26), the angular distribution of radiation scattered by a point 
linear dipole, in the Born approximation, is rather broad; in particular, in the low-frequency limit (43), 

^- = k 4 a 2 mol sm 2 0. (8.51) 
dLl 

If the wave is scattered by a small dielectric body, with a characteristic size a « A (i.e., ka « 1), then 
all its parts re-radiate the incident wave coherently. Hence, we can calculate it in the similar way, just 
replacing the molecular dipole moment (43) with the total dipole moment of the object - see Eq. (3.37): 

p = VV = { £r -l) £o EV, (8.52) 

i 3 is 

with the replacement a mo \ — > V(s r - \)IA7V. 

da k 4 V : 



where V ~ a is body's volume. As a result, the differential cross-section may be obtained from Eq. (51) 

-{s r -l) 2 sin 2 #, (8.53) 



dQ {Any 



2 

i.e. follows the same sin 0 law. The situation for extended objects, with at least one dimension of the 
order, or larger than the wavelength, is different: here we have to take into account that the phase shifts 
introduced by various parts of the body are different. Let us analyze this issue for an arbitrary collection 
of similar point scatterers located at points r,. 

If wave vector of the incident plane wave is k 0 , the field the wave has the phase factor 
exp{z'ko-r} - see Eq. (7.79). At the location of j-th scattering center, the factor equals to exp{z'ko-r/}, so 
that the local polarization vector p, and the scattered wave it creates, are proportional to this factor. On 
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its way to the observation point r, the scattered wave, with wave vector k (with k = k 0 ), acquires an 
additional phase factor exp{ik-(r - rj)}, so that the scattered wave field is proportional to 



Scattering 
function 



Phase 
sum 



exp{z'k 0 - r. + ik(r -Tj)} = exp{/(k 0 -k) -iv + ik-r} = <?' k ' r exp{-/(k -k 0 ) -iv} 



(8.54) 



Since the first factor in the last expression does not depend on r 7 , in order to calculate the total scattering 
wave, it is sufficient to sum up the elementary phase factors exp{-iq-iy}, where vector 



q = k-k ( 



(8.55) 



has the physical sense of the wave number change at scattering. 19 It may look like the phase factor 
depends on the choice of origin. However, according to Eq. (7.42), the average intensity of the scattered 
wave is proportional to E ct E m , i.e. to the following real scalar function of vector q: 





f \ 


f A 


* 


F(q) = 


£exp{-«q-r ; .} 


£exp{-iq-r,,} 


= £exp{/q-(r. -r f )} = \l(q)\\ 




v J ) 


V J" J 


hi' 



where the complex function 



'(q) = X ex P{- z 'q-r,} 



(8.56) 



(8.57) 



is called the phase sum, may be calculated within any reference frame, without affecting the final result 
(56). The double-sum form of Eq. (56) is convenient to notice that for a system of many (N » 1) of 
similar but randomly located scatterers, only the terms with j =j' accumulate at summation, so that F(q) 
scales as N, rather than A^ 2 - thus justifying the above treatment of the Rayleigh scattering problem. 

Let us start using Eq. (56) by applying it to the simplest problem of just two similar small 
scatterers, separated by a fixed distance a: 

2 q a a 



^Xfl) = ^ QX P{H'( r j _r /)} = 2 + exp{-iq a a} + exp{iq a a} = 2(l + cosg fl a) = 4cos 2 



(8.58) 



7,/=l 



where q a = q-a/a is the component of vector q along vector a connecting the scatterers. The apparent 
simplicity of this result may be a bit misleading, because the mutual plane of vectors k and k 0 (and 
hence of vector q) does not necessarily coincide with the mutual plane of vectors kn and so that the 
scattering angle a between vectors k and k 0 is generally different from (kI2 -6)- see Fig. 5. 




Fig. 8.5. Angles important for the general 
scattering problem. 



19 In quantum electrodynamics, ftq has the sense of the momentum transferred from the scattering object to the 
scattered photon, and this terminology sometimes creeps even into the classical electrodynamic texts, 
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Moreover, vectors q and a may have another common plane, and angle between them is one 
more parameter that may be considered as independent from both a and 0. As a result, the angular 
dependence of the scattered wave's intensity (and hence da/dCl), that depends on all three angles, may 
be rather complex. 

This is why let me consider only the simple case when vectors k, k 0 , and a are all in the same 
plane (Fig. 6a), with ko perpendicular to a (leaving the general analysis for readers' exercise). Then, 
with our choice of coordinates, q a = q x = ksina, and Eq. (58) is reduced to 

2 ka sin a 



F(q) = 4cos' ! 



(8.59) 



This function always has two maxima, at a = 0 and a = n, and possibly (if the product ka is large 
enough) other maxima at special angles a n that satisfy the famous Bragg condition 20 



kasina„=27m, i.e. a sin a = nX. 



(8.60) 



Bragg 
condition 




As evident from Fig. 6a, this condition may be readily understood as the in-phase addition 
(frequently called the constructive interference) of two coherent waves scattered by the two points, 
when the difference between their paths toward the observer, asina, equals to an integer number of 
wavelengths. At each such maximum, F = 4, due to the doubling of the wave amplitude and hence 
quadrupling its power. 

If the distance between the point scatterers is large (ka » 1), the first Bragg maxima correspond 
to small angles, a «1. For this region, Eq. (59) in reduced to a simple sinusoidal dependence of 
function F on angle a. Moreover, within the range of small a, the polarization factor sin 2 ^ is virtually 
constant, so that the scattered wave intensity, and hence the differential cross-section 



da _,, . . 2 kaa 

ocF(q) = 4cos 

dQ, 2 



(8.61) 



Young's 

interference 

pattern 



This is of course the well-known interference pattern, well known from the Young's two-slit 
experiment. 21 (As will be discussed in the next section, theoretical description of the two-slit experiment 



20 Named after Sir William Bragg and his son, Sir William Lawrence Bragg, who in 1912 demonstrated X-ray 
diffraction by atoms in crystals. The Braggs' experiments have made the existence of atoms (before that, a 
hypothetical notion ignored by many physicists) indisputable. 
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is more complex than that of the Born scattering, but is preferable experimentally, because at scattering, 
the wave of intensity (61) has to be observed on the backdrop of a stronger incident wave that 
propagates in almost the same direction, a= 0.) 

Now let us consider Born scattering by a distributed object, say an extended dielectric body with 
a constant value of s r . Transferring Eq. (56) from the sum to an integral, for the differential cross-section 
we get 



where /(q) now becomes the phase integral, 22 



Phase 
integral 



/(q) = {exp{-/q-r'}j 3 r' ! 



(8.62) 



(8.63) 



with the dimensionality of volume. 

As the simplest example of application of this formula, let us consider scattering by a thin 
dielectric rod (with both dimensions of the cross-section's area much smaller than X, but an arbitrary 
length a), otherwise keeping the same simple geometry as for two point scatterers - see Fig. 6b. In this 
case the phase integral is just 



Fraunhofer 
diffraction 
integral 



+a/2 



l(q) = A ^exp{—iq x x')dx' = A 



exp {-iq x a 12}- exp {-iq x a 12} _ sin £ 



-a/2 



iq 



(8.64) 



where V = Aa is the volume of the rod, and £ is a dimensionless parameter defined as 

^ _ q x a _ fcasina 



(8.65) 



Sine 
function 



The fraction participating in Eq. (64) is met in physics so frequently that is has deserved the special 
name sine (not "sync", please!) function: 

oin P 

(8.66) 

Obviously, this function, plotted in Fig. 7, vanishes at all points g„ = m, with integer n, besides point n 
= 0: sinc^o = sine 0=1. 




21 This experiment was described as early as in 1803 by T. Young - one more universal genius of science, who 
has also introduced the Young modulus in the elasticity theory (see, e.g., CM Chapter 7), besides numerous other 
achievements - including deciphering Egyptian hieroglyphs! The two-slit experiment has firmly established the 
wave picture of light, to be replaced by the dualistic photon-vs-wave picture, formalized by quantum 
electrodynamics, only 100+ years later. 

22 Since the observation point's position r does not participate in this formula explicitly, the prime sign in r' could 
be dropped, but I keep it as a reminder that the integral is taken over points r' of the scattering object. 
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Fig. 8.7. Sine function. 



The function F(q) = V sine £ resulting from Eq. (64), is plotted by red line in Fig. 8, and is 
called the Fraunhofer diffraction pattern. 




Fig. 8.8. The Fraunhofer diffraction 
pattern (solid red line) and its envelope 
(dashed line). For comparison, the 
blue line shows the standard 
interference pattern cos 2 £- cf. Eq. (59). 



Note that it oscillates with the same argument period A(kasma) = Inlka « 1 as the interference 
pattern (59) from two point scatterers (shown with the blue line in Fig. 8). However, at the interference, 
the scattered wave intensity vanishes at angles a„ ' that satisfy condition 



ka sin a' n 
In 



n + - 



1 



(8.67) 



when the optical paths difference asina equals to a semi-integer number of wavelengths AJ2 = nlk, and 
hence the two waves from the scatterers arrive to the observer in anti-phase (the so-called destructive 
interference). On the other hand, for the diffraction from a continuous rod the minima occur at a 
different set of angles, 



ka sin a n 
2n 



= n, 



(8.68) 



i.e. exactly where the two-point interference pattern has its maxima. The reason for this relation is that 
the wave diffraction on the rod may be considered as a simultaneous interference of waves from all its 
fragments, and exactly at the observation angles when the rod edges give waves with phases shifted by 
2mi, the interior point of the rod give waves with all possible phases, with their algebraic sum equal to 
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zero. Even more visibly in Fig. 8, at diffraction the intensity oscillations are limited by a rapidly 
decreasing envelope function l/£ . The reason for this fast decrease is that with each Fraunhofer 
diffraction period, a smaller and smaller fraction of the road gives an unbalanced contribution to the 
scattered wave. 

If rod's length is small (ka « 1, i.e. a « A), then sine's argument £ is small at all scattering 
angles a, so 7(q) « V, and Eq. (64) is reduced to Eq. (53). In the opposite limit, a » X, the first zeros of 
function 7(q) correspond to very small angles a , for which sin^ ~ 1, so that the differential cross- 
section is 



da 
dQ 



k 4 V 2 
(4,r) 2 



(s r -l) 2 sinc ; 



kaa 



i.e. Fig. 8 shows the scattering intensity as a function of the diffraction direction 
observed within the plane containing the rod. 



(8.69) 



if the pattern is 



8.5. The Huygens principle 

The Born approximation allows tracing the basic features of (and the difference between) the 
phenomena of interference and diffraction. Unfortunately, this approximation, based on the relative 
weakness of the scattered wave, cannot be used for more popular experimental implementations of these 
phenomena, for example, the Young's two-slit experiment, or diffraction on a single slit or orifice - see, 
e.g. Fig. 9. Indeed, at such experiments, the orifice size a is typically much larger than light's 
wavelength, and as a result, no clear decomposition of the fields to the incident and "scattered" waves is 
possible. 



opaque 
screen 




Fig. 8.9. Typical geometry 
for the Huygens principle 
application. 



However, for such experiments, another approximation, called the Huygens (or "Huygens- 
Fresnel") principle, 23 is very instrumental: the passed wave may be presented as a linear superposition 
of spherical waves of the type (17), as if they were emitted by every point of the orifice (or more 
physically, by every point of the incident wave's front that has arrived at the orifice). This 
approximation is valid if the following strong conditions are satisfied: 



23 Named after C. Huygens (1629-1695) who had conjectured the wave theory of light (that remained 
controversial for more than a century, until T. Young's experiments), and A.-J. Fresnel (1788-1827) who has 
developed the mathematical theory of diffraction. 
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A«a«r, (8.70) 

where r is the distance of the observation point from the orifice. In addition, as we have seen in the last 
section, at small Ala the diffraction phenomena are confined to angles a ~ Ilka ~ A/a « 1. For 
observation at such small angles, the mathematical expression of the Huygens principle, for a complex 
amplitude fair) of a monochromatic wave/(r, t) = Re[fc#~ iax ], is given by the following simple formula 

ikR 

f.(r) = C j f m (r')-^i 2 r' . (8.71) 

orifice 

Here/ is any transversal component of any of wave's fields (either E or H), 24 R is the distance between 
point r' at the orifice and the observation point r (i.e. the magnitude of vector R = r - r'), and C is a 
complex constant. 

Before describing the proof of Eq. (71), let me carry out its sanity check - which also will give us 
the constant C. Let us see what happens if the field under the integral is the usual plane wave fofz) 
propagating along axis z (i.e. there is no opaque screen at all), so we should take the whole x-y plane, 
say with z' = 0, as the integration area (Fig. 10). 



source 
point r ' / 




R 



observation 
point r 



Fig. 8.10. The Huygens 
principle applied to a plane 
wave. 



Then, for the observation point with coordinates x = 0, y = 0, and z » A, Eq. (71) yields 



fSz) = Cf (O (0)\dx l \dy 



,exp 



lk(x' 



2 + y' 2 +z 2 



+ y'^+z 



.J/2 



ft 



(8.72) 



Before specifying the integration limits, let us consider the range \x'\, \y'\ « z. In this range the square 
root, met in Eq. (72) twice, may be approximated as 

,1/2 , 



(x' 2 + y' 2 +z 2 



' x' 2 + y' 2 ^ 
1 + ^~ 



v 



1 + 



x' 2 + y' 2 ^ 



2z 



= z + 



J 



x' 2 + y' 2 
2z 



(8.73) 



The denominator of Eq. (72) is a much slower function of x' and y' than the exponent, and in it (as we 
will check a posteriori), it is sufficient to keep just the main, first term of expansion (73). With that, Eq. 
(72) becomes 



24 The fact that the Huygens principle is valid for any field component should not too surprising. Due to condition 
a » A, the real boundary conditions at the orifice edges are not important; what is only important that the screen, 
that limits the orifice, is opaque. Because of this, the Huygens principle's expression (71) is a part of the so-called 
scalar theory of diffraction. (In this course I will not have time to go beyond this approximation.) 
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Jkz 



f m (z) = Cf m (Q) \dx'\dy'exp 



ik(x' 2 + y' 2 )_ 



Jkz 



where I x and I y are two similar integrals; for example, 

f~ Nl/2 
k 



I x = J exp — d x' 



2z 



. 1/2 



qr.(0) — 



(8.74) 



2z 



|exp{/f}^ = ^j [fcos(f^ + ^infe 2 H' ( 8 - 75 ) 



where (Jfc/2z) . These are the so-called Fresnel integrals. I will discuss them in more detail in the 
next section, and right now, only one property of these integrals is important for us: if taken in 

1 10 

symmetric limits [-£o, +£o], both of them rapidly converge to the same value, {nil) , as soon as £o 
becomes much larger than l. 25 This means that even if we do not impose any exact limits on the 
integration area in Eq. (72), this integral converges to value 



L(z) = cf m (oy 



1/2 



V K J 



1/2 



+ 1 



f 



2ni 



C 

V k 



f ( Me 



ikz 



(8.76) 



due to contributions from the central area with linear size of the order of A£~ 1, i.e. 



Ax ~ Ay 



Z 



W /2 , 



(8.77) 



so that the contribution by front points r' well beyond the range (77) is negligible. 26 (Within our 
assumptions (70), which in particular require X to be much less than z, the diffraction angle Ax/z ~ Aylz 
~ (XI z) , corresponding to the important area of the front, is small.) In order to sustain the plane wave 
propagation, fj^z) = fJiO)e' kz , constant C in Eq. (76) has to be taken equal to k/2m. Thus, the Huygens 
principle's prediction (71), in its final form, reads 



Huygens 
principle's 
expression 



7 IKK 

L(r) = — \ f m (r') e —d 2 r', 
2m R 



orifice 



(8.78) 



and describes, in particular, the straight propagation of the plane wave (in a uniform media). 



Let me pause to emphasize how nontrivial this result is. It would be a natural corollary of Eq. 
(25) (and the linear superposition principle) if all points of the orifice were filled with point scatterers 
that re-emit all the incident waves into spherical waves. However, as it follows from the above proof, 
the Huygens principle is also valid if there is nothing in the orifice but the free space! 

This is why it is important a proof of the principle, 27 based on the Green's theorem (2.207). Let 
us apply this theorem to function f=f eg> where f a is the complex amplitude of a scalar component of one 
of wave's fields, which satisfies the Helmholtz equation (7.192), 



25 See, e.g., MAEq. (6.10). 

26 This result very is natural, because exp{/&7?} oscillates fast with the change of r', so that the contributions from 
various front point are averaged out. Indeed, the only reason why the central part of plane [x\ y'] gives a 
nonvanishing contribution (76) to fJiz) is that the phase exponents stops oscillating at (x' 2 + y ' 2 ) below ~zlk - see 
Hq. (73). 

27 This proof was given in 1882 by G. Kirchhoff. 
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(V 2 +k 2 )f a (r) = 0, (8.79) 

and function g = go, which is the time Fourier image of the corresponding Green's function. It may be 
defined, as usual, as the solution to the same equation with the added delta-functional right-hand part 
with an arbitrary coefficient, for example, 

(V 2 + k 2 )g a (r,r') = -Andix - r') . (8.80) 

With Eqs. (79) and (80) used to express the Laplace operators of functions f a and g CCh Eq. (2.207) 
becomes 

\{f 0 \-k 2 g a {r,r')-4xS(r-r')]-g a {rX)[-k 2 f C0 }d>r = § 

V s 

where n is the outward normal to the surface S limiting volume V. Two terms in the left-hand side of this 
relation cancel, so that after swapping r and r ' we get 




(8.82) 



This relation is only correct if the selected volume V includes point r (otherwise we would not 
get its left-hand part from the integration of the delta-function), but does not include the genuine source 
of the wave (otherwise Eq. (79) would have a nonvanishing right-hand part). Let r be the field 
observation point, V all the source-free half-space (for example, the half-space right of the screen in Fig. 
9), so that S is the surface of the screen, including the orifice. Then the right-hand part of Eq. (82) 
describes the field in the observation point r induced by the wave passing through the orifice points r'. 
Since no waves are emitted by the opaque parts of the screen, we can limit the integration by the orifice 
area. 28 Assuming also that the opaque parts of the screen do not re-emit waves "radiated" by the orifice, 
we can take the solution of Eq. (80) to be the retarded potential for the free space: 29 

ikR 

gm (r,r') = e —. (8.83) 
K 

Plugging this expression into Eq. (82), we get 



-4xf a (r) = | 

orifice 

This is the so-called Kirchhoff (or "Fresnel-Kirchhoff ') integral. 30 Now, let us make the two 
additional approximations. The first of them stems from Eq. (70): at ka » 1, the wave's spatial 
dependence in the orifice area may be presented as 



fa 



3£ ffl (r,r') 



dn 



dn 



d 2 r. 



(8.81) 
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R , 



r e m }df a (r') 



R 



dn' 



d 2 r' 



(8.84) 



Kirchhoff 
integral 



28 Actually, this is a somewhat nontrivial point of the proof. Indeed, it may be shown that the solution of Eq. (79) 
identically equals to zero if/[r') and df(r')/dn' vanish together at any part of the boundary. As a result, building 
the solution with the account of exact boundary conditions (which is the task of the vector theory of diffraction) is 
possible but cumbersome. Here we base our solution on the physical intuition. 

29 It follows, e.g., from Eq. (16) with a monochromatic source q{f) = q (t pxp{-ia>t}, at the value q w = 4ns that fits 
the right-hand part of Eq. (80). 

30 With the integration extended over all boundaries of volume V, this would be an exact result. 
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f a (r') = (a slow function of r') x exp{z'k 0 • r'} , (8.85) 

where "slow" means a function that changes on the scale of a rather than X. If, also, kR» 1, then the 
differentiation in Eq. (84) may be, in both instances, limited to the rapidly changing exponents, giving 

m 



-4*/.(r) = fi(k + k,)-n'— f(r')d 2 r', (8.86) 



R 

Second, if all observation angles are small, we can take kn' « k 0 n' « -k. With that, Eq. (86) is reduced 
to Eq. (78) expressing the Huygens principle. 

It is clear that the principle immediately gives a very simple description of the interference of 
waves passing through two small holes in the screen. Indeed, if the hole size is negligible in comparison 
with distance a between them (though still much larger than the wavelength!), Eq. (78) yields 

f <B (r) = c 1 e '+c 2 e 2 , with c 12 = — ^— — , (8.87) 

2mR l 2 

where R\,2 are the distances between the holes and the observation point, and A\ >2 are the hole areas. For 
the interference wave intensity, Eq. (87) yields 

S x fjl =| c i| 2 + l c 2| 2 + 2|c I ||c 2 |cos[*(rt 1 -R 2 )+<p], (p = avg Cl -argc 2 . (8.88) 

The first two terms in this result clearly represent the intensities of partial waves passed through each 
hole, while the last one the result of their interference. The interference pattern's contrast ratio 



+ C 2 



(8.89) 



is largest (infinite) when both waves have equal amplitudes. 

The analysis of the interference pattern is simple if the line connecting the holes is perpendicular 
to wave vector k « ko - see Fig. 6a. Selecting the coordinate axes as shown in that figure, and using for 
distances i?i,2the same expansion as in Eq. (73), for the interference term in Eq. (88) we get 

( kxa *\ 

cos[/c(i?, -i? 2 ) + cos — + <p . (8.90) 

\ z J 

This means that the intensity does not depend on y, i.e. the interference pattern in the plane of constant z 
presents straight, parallel strips, perpendicular to vector a, with the period given by Eq. (60), i.e. by the 
Bragg law. 31 Note that this (somewhat counter-intuitive) result is strictly valid only at (x 2 + y 2 ) « z 2 \ it 
is straightforward to use the next term in the Taylor expansion (73) to show that farther from the 
interference pattern center the strips start to diverge. 



31 The phase shift <p vanishes at the normal incidence of a plane wave on the holes. Note, however, that the 
spatial shift of the interference pattern following from Eq. (90), Ax: = -{zlka)<p, is extremely convenient for the 
experimental measurement of the phase shift between two waves, especially if it is induced by some factor (such 
as insertion of a transparent object into one of interferometer's arms, etc.) that may be turned on/off at will. 
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8.6. Diffraction on a slit 

Now let us use the Huygens principle to analyze a more complex problem: plane wave's 
diffraction on a long straight slit of constant width a (Fig. 1 1). 



incident x </^ 
wave 



screen with 
a slit 

+ a/2 



diffracted 
wave 



x 



observation 
plane 




Fig. 8.1 1. Diffraction on a slit. 



According to Eq. (70), in order to use the Huygens principle for the problem analysis we need to 
have A, « a « z. Moreover, the simple formulation (78) of the principle is only valid for small 
observation angles, | x \ « z. Note, however, that the relation between two small dimensionless 
numbers, zla and alX is so far arbitrary; as we will see in a minute, this relation will determine the type 
of the observed diffraction pattern. 

Let us apply Eq. (78) to our current problem (Fig. 11), for the sake of simplicity assuming the 
normal wave incidence, and taking z = 0 at the screen plane: 

- /o /.KJ,A--f^ f ): \ V ' : ;^ :} (8.91) 
where /o = fjx', 0) = const is the incident wave's amplitude. This is the same integral as in Eq. (72), 



>\2 i } 2 2 

x)+y « z : 



except for the finite limits for x ', and may be simplified similarly, using the small-angle condition (x - 

f( , , k e ikz + f ,? , ik[(x-x') 2 + y' 2 ] k e ikz 

f( x ,z) ~ fo 7— I dx \dy exp— l - = /„ -— /,/, . (8.92) 

lni z J al2 i 2 Z In i z 



The integral over y is the same as in the last section: 



y 



r iky 2niz \ . 

= Jexp^— = —— (8.93) 

i 2 z V k J 



but the integral over x is more complicated, because of its finite limits: 



+a/2 , ,0 



7, S f exp ^ x > dx'. (8.94) 

J/2 2 2 

It may be simplified in the following two (opposite) limits. 
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(i) Fraunhofer diffraction takes place when zla » all - the relation which may be rewritten 



either as a « (zX) 12 , or as ka 1 « z. In this limit the ratio kx' L lz is negligibly small for all values of x' 
under the integral, and we can approximate it as 

ik(x 2 - 2xx' + x' 2 ) 



'2/ 



j" exp- 

-a/1 



+o/2 



2z 



dx' « j" 



exp 
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2z 
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= exp 



ikx 2 +fl r 2 f ikxx'} . , 2z 

J exp< >dx =— exp 



2z 
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J ikx 2 I 



sin- 



kxa 
~2z~ 



(8.95) 



so that Eq. (92) yields 



k e ,kz 2z 



2n i z kx 
and hence the relative wave intensity is 



2niz 



) exp i M 



sin- 



kxa 
~2z~ 



(8.96) 
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Tikx 



2z n z 
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kaa 



(8.97) 



where So is the (average) intensity of the incident wave, and a = xlz « 1 is the scattering angle. 
Comparing this expression with Eq. (69), we see that this the diffraction pattern is exactly the same as 
that of a similar (uniform, ID) object in the Born approximation - see the red line in Fig. 8. Note again 
that the angular width 8a of the Fraunhofer pattern is of the order of Ilka, so that its linear width dx = 
z8a ~ zlka ~ zkla? 1 Hence the condition of the Fraunhofer approximation validity may be also presented 
as a « Sx. 

(ii) Fresnel diffraction. In the opposite limit of a relatively wide slit, with a » Sx = z8a ~ zlka ~ 
zMa, i.e. ka » z, the diffraction patterns at two slit edges are well separated. Hence, near each edge 
(for example, near x' = -a/2) we may simplify Eq. (94) as 

ik(x - x') 2 



I x (x)~ J exp- 



-a/l 



2z 



-dx' = 



V « J 



+00 

^exp{iC 2 }dC, 



(8.98) 



(kl2zj l2 {x+al2) 

and express it via the special functions called the Fresnel integrals: 37 ' 



Fresnel 
integrals 



- Jcos(C 2 )^, - Jsin(C 2 )^ 



(8.99) 



whose plots are shown in Fig. 12. As was mentioned above, at large values of their argument (^), both 
functions tend to Vi. 



32 Note also that since in this limit ka 2 « z, Eq. (97) shows that even the maximum value 5(0, z) of the diffracted 
wave intensity is much less than intensity So of the incident wave. This is natural, because the incident power Soa 
per unit length of the slit is now distributed over a much larger width & » a, so that 5(0, z) ~ S 0 (a/Sx) « S 0 . 

33 Slightly different definitions of these functions, mostly affecting constant factors, may also be met in literature. 
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Fig. 8. 12. Fresnel integrals. 



10 



Plugging this expression into Eq. (92) and (98), for the diffracted wave intensity, in the Fresnel 
limit (i.e. at | x + all \ « a), we get 
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Fresnel 
(8.100) diffraction 
pattern 



A plot of this function (Fig. 13) shows that the diffraction pattern is very peculiar: while in the "shade" 
region x < -a/2 the wave intensity fades monotonically, the transition to the "light" region within the gap 
(x > -a/2) is accompanied by intensity oscillations, just as at the Fraunhofer diffraction - cf. Fig. 8. 




0 5 

(k/2z) U2 (x + a/2) 



Fig. 8.13. Fresnel 
10 diffraction pattern. 



This behavior, which is described by the following asymptotes, 

i-T 

\2z) 
for E, — » -oo , 



S_ 



. 1 sin(£ 2 -;r/4) 
1 + ^= — — -, forf 

l 



4^ 
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+oo, 



(8.101) 



is essentially an artifact of observing just the wave intensity (i.e. its real amplitude) rather than its phase 
as well. Indeed, as may be seen even more clearly from the parametric presentation of the Fresnel 
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integrals (Fig. 14), these functions oscillate similarly at large positive and negative values of their 
argument. Physically, this means that the wave diffraction by the slit edge leads to similar oscillations of 
its phase at x > -a/2 and x < -all; however, in the latter region (i.e. inside the slit) the diffracted wave 
overlaps the incident wave passing through the slit directly, and their interference reveals the phase 
oscillations, making them visible in the measured intensity as well. 



SOT) + - 0.5 




Fig. 8.14. Parametric representation of the 
Fresnel integrals. Phis pattern is called 
either the Euler spiral or Cornu spiral. 



C(<f) + 



.1/2 



Note that according to Eq. (100), the linear scale of the Fresnel diffraction pattern is (2z/k) 
i.e. is complied with estimate (77). If the slit is gradually narrowed, so that width a becomes comparable 
to that scale, 34 the Fresnel interference patterns from both edges start to "collide" (interfere). Phe 
resulting wave, fully described by Eq. (94), is just a sum of two contributions of the type (98) from the 
both edges of the slit. Phe resulting interference pattern is somewhat complicated, and only a « Sx it is 
reduced to the simple Fraunhofer pattern (97). Of course, this crossover from the Fresnel to Fraunhofer 
diffraction may be also observed, at fixed wavelength X and slit width a, by increasing z, i.e. by 
measuring the diffraction pattern farther and farther from the slit. 



Note that the Fraunhofer limit is always valid if the diffraction measured as a function of the 
diffraction angle a alone, i.e. effectively at infinity, z — > °o. Phis may be done, for example, by 
collecting the diffracted wave with a "positive" (converging) lense, and observing the diffraction pattern 
in its focal plane. 



8.7. Geometrical optics placeholder 

Behind all these details, I would not like the reader to miss the main feature of diffraction, that 
has an overwhelming practical significance. Namely, besides narrow diffraction "cones" (actually, 

1/2 

parabolic-shaped regions) with transversal scale Ax ~ (Xz) , the wave far behind a slit of width a » X 
repeats the field just behind the slit, i.e. reproduces the unperturbed incident wave inside the slit, and has 
negligible intensity in the shade regions outside it. An evident generalization of this fact is that when a 
plane wave (in particular an electromagnetic wave) passes any opaque object of large size a » X, it 

1/2 

propagates around it, by distances z up to ~(a/X) , along straight lines, with virtually negligible 



Note that this condition may be also rewritten as a ~ Sx, i.e. zla ~ alX. 
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diffraction effects. This fact gives the strict foundation for the very notion of the wave ray (or beam), as 
the line perpendicular to the local front of a quasi-plane wave. In a uniform media such ray is a straight 
line, but changes in accordance with the Snell law at the interface of two media with different wave 
speed v, i.e. different values of the refraction index. The notion of rays enables the whole field of 
geometric optics, devoted mostly to ray tracing in various (sometimes very complex) systems. 

This is why, at this point, an E&M course that followed the scientific logic more faithfully than 
this one, would give an extended discussion of the geometric and quasi-geometric optics, including (as a 
minimum 35 ) such vital topics as 

- the so-called lensmaker's equation expressing the focus length/of a lens via the curvature radii 
of its spherical surfaces and the refraction index of the lens material, 

- the thin lense formula relating the image distance from the lense via/ and the source distance, 

- the concepts of basic optical instruments such as telescopes and microscopes, 

- the concepts of the spherical, angular, and chromatic aberrations (image distortions); 

- wave effects in optical instruments, including the so-called Abbe limit 36 on the focal spot size. 

However, since I have made a (possibly, wrong) decision to follow the common tradition in 
selecting the main topics for this course, I do not have time for such discussion. Still, I am placing this 
"placeholder" pseudo-section to relay my conviction that any educated physicist has to know the 
geometric optics basics. If the reader has not had an exposure to this subject during his or her 
undergraduate studies, I highly recommend at least browsing one of available textbooks. 37 



8.8. Fraunhofer diffraction from more complex scatterers 



So far, our discussion of diffraction has been limited to a very simple geometry - a single slit in 
an otherwise opaque screen (Fig. 11). However, in the most important Fraunhofer limit, z » ka 2 , it is 
easy to get a very simple expression for the plane wave diffraction/interference by a plane orifice (with 
linear size ~a) of an arbitrary shape. Indeed, the evident 2D generalization of approximation (93)-(94) is 
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2z 
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(8.102) 



so that besides the inconsequential total phase factor, Eq. (92) is reduced to 



/(p)oc/ 0 Jexp{-/K-p'}jV = /o Jr(p')exp{-/K-p'}jV 



orifice 



(8.103) 



General 
Fraunhofer 
diffraction 
pattern 



35 Admittedly, even this list leaves aside several spectacular effects due to crystal anisotropy, including such a 
beauty as conical refraction in biaxial crystals - see, e.g., Chapter 15 of the classical textbook by M. Born and E. 
Wolf, cited in the end of Sec. 7.1. 

36 Reportedly, due to not only E. Abbe (1873), but also to H. von Helmholtz (1874). 

37 My top recommendation for that purpose would be Chapters 3-6 and Sec. 8.6 in Born and Wolf. A simpler 
alternative is Chapter 10 in G. R. Fowles, Introduction to Modern Optics, 2 nd ed., Dover, 1989. 
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where the 2D vector k (not to be confused with wave vector k that is virtually perpendicular to k!) is 
defined as 

K = fc^«q = k-k 0 , (8.104) 

z 

p = {x, y} and p' = {x', y'} are 2D radius-vectors in, respectively, the observation and screen planes 
(both nearly normal to vectors k and k 0 ), function T(p') describes screen's transparency at point p', and 
the last integral in Eq. (103) is over the whole screen plane z = 0. (Though the strict equivalence of the 
two forms of Eq. (103) is only valid if T(p' ) equals to either 1 or 0, its last form may be readily obtained 
from Eq. (78) with/(r') = T(p' )fo for any transparency profile, provided that T(p' ) is an arbitrary 
function but changes only at distances much larger than X = 2nlk.) 

From the mathematical point of view, the last form of Eq. (103) is the 2D spatial Fourier 
transform of function T(p'), with the reciprocal variable k revealed by the observation point position: p 
= (z/k)K = (zX/2n)K. This interpretation is useful because of the experience we all have with the Fourier 
transform, mostly in the context of its time/frequency applications. For example, if the orifice is a single 
small hole, T(p') may be approximated by a delta- function, so that Eq. (103) yields /(p) « const. This 
corresponds (at least for the small diffraction angles a = plz, for which the Huygens approximation is 
valid) to a spherical wave spreading from the point-like orifice. Next, for two small holes, Eq. (103) 
immediately gives the Young interference pattern (90). Let me now use Eq. (103) to analyze the 
simplest (and most important) ID transparency profiles, leaving 2D cases for reader's exercise. 

(i) A single slit of width a (Fig. 11) may be described by transparency 




for pt' < a/2, 
otherwise. 



(8.105) 



Its substitution into Eq. (103) yields 

\ r + T f ■ ,\ . , , exp{-ix- I a/2}-exp{ix , J[ fl/2} . (k x o\ . ( kxa\ 

f(?) cc fo expj-j/f^x )dx ' = /„ — — -ccsinc — — = sine , (8.106) 

' -»*- ? I 2z 



-all ~ lK x V ^ J \ J 

naturally returning us to Eqs. (64) and (97), and hence to the red lines in Fig. 8 for the wave intensity. 
(Please note again that Eq. (103) describes only the Fraunhofer, but not the Fresnel diffraction!) 

(ii) Two narrow similar, parallel slits with a much larger distance a between them, may be 
described by taking 

r(p ') oc S{x' - a 1 2) + S(x' + a 1 2) , (8.1 07) 
so that Eq. (103) yields the generic interference pattern, 



f(p) * f 0 



IK a IK a 

exp< — \ + exp< — — 



cccos— ^— = cos , (8.108) 

2 2z 



whose intensity is shown with the blue line in Fig. 8. 

(iii) In a more realistic Young-type two-slit experiment, each slit has width (say, w) which is 
much larger than light wavelength X, but still much smaller than slit spacing a. This situation may be 
described by the following transparency function 
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[l, for|x'±a/2|<w/2, 
1 0, otherwise, 



(8.109) 



for which Eq. (103) yields a natural combination of results (106) (with a replaced with w) and (108): 



f(r) qc sine 
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(8.110) 



This is the usual interference pattern modulated by a Fraunhofer-diffraction envelope (shown with the 

2 

dashed blue line Fig. 15). Since function sine £ decreases very fast beyond its first zeros at g= +n, the 
practical number of observable interference fringes is close to 2a/w. 
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Fig. 8.15. Young's double-slit interference pattern for a finite slit width. 



(iv) A structure very useful for experimental and engineering practice is a set of many parallel 
slits, called the diffraction grating?* Indeed, if the slit width is much less than the grating period d, then 
the transparency function may be approximated as 



-r '-■ 

T(p')oc ^S(x'-nd) 



and Eq. (103) yields 
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This sum vanishes for all values of /c x d that are not multiples of 2n, so that the result describes 
sharp intensity peaks at diffraction angles 
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(8.113) 



Taking into account that this result is only valid for small angles I a m I « 1, it may be interpreted 
exactly as Eq. (59) - see Fig. 6a. However, in contrast with the interference (108) from two slits, the 
destructive interference from many slits kills the net wave as soon as the angle is even slightly different 



38 The rudimentary diffraction grating effect, produced by parallel fibers of bird feathers, was discovered as early 
as in 1673 by J. Gregory - who has also invented the reflecting ("Gregorian") telescope. 
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from each Bragg angle (60). This is very convenient for spectroscopic purposes, because the diffraction 
lines produced by multi-frequency waves do not overlap even if the frequencies of their adjacent 
components are very close. 

Two features of practical diffraction gratings make their properties different from this simple 
picture. First, the finite number N of slits, which may be described by limiting sum (109) to interval n = 
[-N/2, +N/2], results in the finite spread, 8ala~ UN, of each diffraction peak, and hence in the reduction 
of grating's spectral resolution. (Unintentional variations of the inter-slit distance d have a similar effect, 
so that before the advent of high-resolution photolithography, special high-precision mechanical tools 
have been used for grating fabrication.) 

Second, the finite slit width w leads to the diffraction peak pattern modulation by a sine (kwa/2) 
envelope, similarly to pattern shown in Fig. 15. Actually, for spectroscopic purposes such modulation is 
a plus, because only one diffraction peak (say, with m = ±1) is practically used, and if the frequency 
spectrum of the analyzed wave is very broad (cover more than one octave), the higher peaks produce 
undesirable hindrance. Because of this reason, w is frequently selected to be equal exactly to J/2, thus 
suppressing each other diffraction maximum. Moreover, sometimes semi-transparent films are used to 
make the transparency function T{r') continuous and close to the sinusoidal one: 

T(p>) * T 0 + T x cos^ = T 0 + f [exp|/^| + expj- i^ljj . (8.114) 

Plugging the last expression into Eq. (103) and integrating, we see that the output wave consists of just 3 
components: the direct-passing wave (proportional to To) and two diffracted waves (proportional to Ti) 
propagating in the directions of the two lowest Bragg angles, a±\ = ±AJd. 

Relation (103) may be also readily used to obtain one more general (and rather curious) result 
called the Babinet principle. Consider two experiments with diffraction of similar plane waves on two 
"complementary" screens who together would cover the whole plane, without a hole or an overlap. 
(Think, for example, about an opaque disk of radius R and a large opaque screen with a round orifice of 
the same radius.) Then, according to the Babinet principle, the diffracted wave patterns produced by 
these two screens in all directions with a ^ 0 are identical. The proof of this principle is straightforward: 
since the transparency functions produced by the screens are complementary in the following sense: 

r(p') = r 1 (p')+r 2 ( P ') = i, (8.115) 

and (in the Fraunhofer approximation (103) only!) the diffracted wave is a linear Fourier transform of 
T(p'), we get 

/ 1 (P) + / 2 (P) = / 0 (P), (8-116) 

where /o is the wave "scattered" by the composite screen with Tb(p') = 1, i.e. the unperturbed initial 
wave propagating in the initial direction {a = 0). In all other directions,/! = -fi, i.e. the diffracted waves 
are indeed similar besides the difference in sign - which is equivalent to a phase shift by ±n. However, it 
is important to remember that the Babinet principle notwithstanding, in real experiments the diffracted 
waves may interfere with the unperturbed plane wave /o(p), leading to different diffraction pattern in 
cases 1 and 2 - see, e.g., Fig. 13 and its discussion. 
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8.9. Magnetic dipole and electric quadrupole radiation 

Throughout this chapter, we have seen how many important results may be obtained from Eq. 
(26) for the electric dipole radiation by a small-size source (Fig. 1). Only in rare cases when such 
radiation is absent, for example if the dipole moment p of the source equals zero (or does not change at 
time - either at all, or at the frequency of our interest), higher-order effects may be important. I will 
discuss the main two of them, the quadrupole electric and dipole magnetic radiation - mostly for 
reference purposes, because we would not have much time to discuss their applications. 

In Sec. 2 above, the electric dipole radiation was calculated by plugging the first, leading term of 
expansion (19) into the exact formula (17b) for the retarded vector-potential A(r, t). Let us make a more 
exact calculation, by keeping the second term of that expansion as well: 



J 





r 

~ j 


r',t 




V 



r r n 

r',t-- + 

v v j v 



= j 



r',t' + 



r n 



where t' = t- 



(8.117) 



Since the expansion is only valid if the last term in the second argument is relatively small, in the Taylor 
expansion of j with respect to that argument we may keep just the first two leading terms: 



J 



R 



1 d 



r;t \«j(r',t')+-— j(r',*'Xr'-n) 

v v J v or 



(8.118) 



so that Eq. (17b) yields A = A e + A', where A e is the electric dipole contribution as given by Eq. (23), 
and A' is the new term of the next order in small parameter r' «r. 



A'(r,0 = -£-— fj(r',*'Xr'-nyV, 
V 7 4xrvdt' S V A r 



(8.119) 



Just as was done in Sec. 2, let us evaluate this term for a system of nonrelativistic particles with 
electric charges qt and radius-vectors r^CO" 



A'(r,0 = 



Anrv 



(8.120) 



t=t' 



K( r k - n ) = ^(r, •n)+|r,(n-rj=|(r, xr,)xn + |r,(n-rj + |r,(n-rj 



Using the "bac minus cab" identity of the vector algebra again, 39 Eq. (120) may be rewritten as 

(8.121) 

so that the right-hand part of Eq. (120) may be presented as a sum of two terms, A' = A m + A g , where 

A m {rj) = -/^m{t')xn=-/^Jt--)xn, with m(t)= ^r k {t)x q k r k (t), (8.122) 



= ^(r*xr,)xn + ^[r,(n-rj], 

2 2 dt 



A,(r,f) = 



%nrv 



f 

dt 2 



(8.123) 



39 If you need, see, e.g., MA Eq. (7.5). 
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Comparing the second of Eqs. (122) with Eq. (5.91), we see that m is just the magnetic moment 
of the source. On the other hand, the first of Eqs. (122) is absolutely similar in structure to Eq. (23), with 
p replaced by (mxn)/v, so that for the corresponding component of the magnetic field it gives (in the 
same approximation r » X) the result similar to Eq. (24): 
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(8.124) 



According to this expression, just as at the electric dipole radiation, vector B is perpendicular to vector 
n r , and its magnitude is also proportional to the sin 6*, where 9 is now the angle between the direction 
toward the observation point and the second time derivative of vector m rather than p: 
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(8.125) 



As the result, the intensity of this magnetic dipole radiation has the similar angular distribution: 



Magnetic 
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radiation 
power 
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(8.126) 



- cf. Eq. (26). Note, however, that this radiation is usually much weaker than its electric counterpart. For 
example, for a nonrelativistic particle with electric charge q, moving on a trajectory with of size ~a, the 
electric dipole moment is of the order of qa, while its magnetic moment scales as qa 2 co, where co is the 
motion frequency. As a result, the ratio of the magnetic and electric dipole radiation intensities is of the 
order of {acalv) 2 , i.e. the squared ratio of particle's speed to the speed of emitted waves - that has to be 
much smaller than 1 for our nonrelativistic estimate to be valid. 

The angular distribution of the electric quadrupole radiation, described by Eq. (123), is more 
complicated. In order to show this, we may add to A q a vector parallel to n (i.e. along the wave 
propagation), getting 



A,(r,0- 



lAnrv 
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where Q = £ q k {3r k (n • r J - nr/ 



(8.127) 



because this addition does not give any contribution to the transversal component of the electric and 
magnetic fields, i.e. to the radiated wave. According to the above definition of vector Q, its Cartesian 
components may be presented as 



where Qjf are elements of the so-called electric quadrupole tensor of the system: 40 



(8.128) 



(8.129) 



40 In electrostatics, this symmetric, this zero-trace tensor determines the next term in the potential expansion (3.5): 
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(As a math reminder, tensor is the matrix describes physical reality independent of the reference frame 
choice, so that the Cartesian elements of the tensor have to change according to certain geometric rules 
if the reference frame is changed - e.g., rotated. This notion is very similar to a physical vector, that may 
be described by an ordered set of its Cartesian components that change according to certain rules as the 
result of the reference frame' change. We may be confident that a matrix represents a tensor if it 
provides a linear relation between components of two physical vectors - such a Q and n in Eq. (128).) 



Differentiating the first of Eqs. (127) at r » A, we get 




Electric 



,a lo/y, quadrupole 
^o.iouj radiation 



field 



Superficially, this expression is similar to Eqs. (24) or (124), but according to Eqs. (127) and (129), 
components of vector Q depend on the direction of vector n, leading to a different angular dependence 
of S r . As the simplest example, consider a system of two equal point electric charges moving at equal 
distances a(f) « A from a stationary center (Fig. 16). 




Z Fig. 8.16. The simplest system emitting electric 
quadrupole radiation. 



Due to the symmetry of the system, its dipole moments p and m (and hence its electric and 
magnetic dipole radiation) vanish, but the quadrupole tensor (129) still has nonvanishing components. 
With the coordinate choice shown in Fig. 16, these components are diagonal: 



Q xx =Q yy =-2qa\ Q zz =4qa : 



(8.131) 



With axis x in the plane of the direction n toward the source (Fig. 16), so that n x = sin6>, n y = 0, n z = cos 
0,Eq. (128) yields 

Q x . = -2qa 2 sin 0, Q v = 0, Q. = 4qa 2 cos 0 , 



so that the vector product in Eq. (130) has only one nonvanishing component: 

(uxq) =n z Q x -n x Q z = -6qsm0cos0-^—-[a 2 (t)]. 

dt 



(8.132) 



(8.133) 



2 2 

As a result, the radiation intensity is proportional to sin 6bos 0, i.e. vanishes not only along the 
symmetry axis (as the dipole radiation does), but also in all directions perpendicular to this axis, 
reaching its maximum at 0 = nIA. 

For more complex systems, the angular distribution of the electric quadrupole radiation may be 
different, but its total power may be always presented in a simple form 




Electric 

(9. 1 c ' u J adru P ole 
(5.tj4) radiation 

power 



Chapter 9 



Page 32 of 34 



Essential Graduate Physics 



EM: Classical Electrodynamics 



Let me finish this section by giving, without proof, one more fact important for applications: due 
to their different spatial structure, the magnetic dipole and electric quadrupole radiation fields do not 
interfere, i.e. the total power of radiation (neglecting higher multipole terms) may be found as the sum 
of these components, calculated independently. 



8.10. Exercise problems 

8.1 . In the electric dipole approximation, calculate the angular distribution and total power of 
electromagnetic radiation by the following classical model of the hydrogen atom: an electron rotating, at 
a constant distance r, about a much heavier proton. Use the latter result to evaluate the classical lifetime 
of the atom, borrowing the initial value of R from quantum mechanics: R(0) = r B ~ 0.53xl0" 10 m. 



8.2 . Use the Born approximation to calculate the differential cross-section of plane wave 
scattering by a dielectric sphere with s « s 0 , of an arbitrary radius R. In the limits kR « 1 and 1 « kR 
(where k is the wave number), analyze the angular dependence of the differential cross-section, and 
calculate the full cross-section. 



8.3 . Use the Born approximation to calculate the differential cross-section of plane wave 
scattering on a right, circular cylinder of length L and radius R, for arbitrary incidence. 



8.4 . Use the Huygens principle to analyze the Fraunhofer diffraction of an plane wave on a 
square-shape hole, of size axa, in an opaque screen, for the normal incidence. Sketch the diffraction 
pattern you would observe at a sufficiently large distance, and quantify the meaning of term 
"sufficiently large" for this case. 



8.5 . Within the Fraunhofer approximation, analyze the pattern produced by a ID diffraction 
grating with the periodic transparency profile shown below, for the normal incidence of a plane, 
monochromatic wave. 



T 
-•1 



w 



X 



-d 



Chapter 9 



Page 33 of 34 



Essential Graduate Physics EM: Classical Electrodynamics 



Chapter 9 Page 34 of 34 



Essential Graduate Physics 



EM: Classical Electrodynamics 



Chapter 9. Special Relativity 

This chapter starts with a brief review of the special relativity's basics. This background is used, later in 
the chapter, for the analysis of the relation between electromagnetic field values measured in different 
reference frames moving relative to each other, and discussions of relativistic particle dynamics in the 
electric and magnetic fields, and of analytical mechanics of electromagnetism. 



9.1. Einstein postulates and the Lorentz transform 

As was emphasized at the derivation of expressions for the dipole and quadrupole radiation in 
the last chapter, they are only valid for systems nonrelativistic particles. Thus, these results cannot be 
used for description of such important phenomena as the Cherenkov radiation or synchrotron radiation, 
in which relativistic effects are essential. Moreover, analysis of motion of charged relativistic particles 
in electric and magnetic fields is also a natural part of electrodynamics. This is why I will follow the 
tradition to using this course for a (by necessity, brief) introduction to special relativity theory. This 
theory is based on the idea that measurements of all physical variables (including spatial and even 
temporal intervals between two events) may give different results in different reference frames, in 
particular two frames moving relative to each other translationally (i.e. without rotation), with a certain 
constant velocity v (Fig. 1). 



A" 



A- 



O'L 



v 



f r = {x,y,z} 
\r' = {x',y',z'} 



x 



Fig. 9.1. Translational, uniform motion 
of two reference frames. 



In the non-relativistic (Newtonian) mechanics the problem of transfer between such reference 
frames has a simple solution at least in the limit v « c, because the basic equation of particle dynamics 



(the 2 nd Newton law) 



m k r k 



-v,i;t/(i 



(9.1) 



where U is the potential energy if inter-particle interactions, is invariant with respect to the so-called 
Galilean transform (or "transformation"). 2 Choosing the coordinate axes of both frames so that axes x 
and x' are parallel to vector v (Fig. 1), the transform may be presented as 



Galilean 
transform 



x = x' + vt', y = y', z 



(9.2a) 



1 Let me hope that the reader does not need a reminder that in order for Eq. (1) to be valid, the reference frames 0 
and 0' have to be inertial - see, e.g., CM Sec. 1.3. 

2 It was first formulated by G. Galilei in 1638. 
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and the invariance of Eq. (1) with respect to this transform means that plugging Eq. (2a) into it, we get 
absolutely the same equation of motion in the "moving" reference frame 0'. Since the reciprocal 
transform, 



x = x-vt, y = y 



z, t' = t. 



(9.2b) 



is similar to the direct one, with the replacement of (+v) with (-v), we may say that the Galilean 
invariance means that there is no any "master" (absolute) spatial reference frame in classical mechanics, 
although the spatial and temporal intervals between different events are absolute (reference-frame- 
invariant). 



However, it is straightforward to use Eq. (2) to check that the wave equation 
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dx dy dz c dt z 



f = 0, 



(9.3) 



describing in particular the electromagnetic wave propagation in free space, 3 is not Galilean-invariant. 4 
For the "usual" (say, elastic) waves, which obey a similar equation albeit with a different speed, 5 this 
lack of Galilean invariance is natural and is compatible with the invariance of Eq. (1) from which the 
wave equation originates. This is because the elastic waves are essentially oscillations of interacting 
particles of a certain media (e.g., an elastic solid), which makes the reference frame connected to this 
media, special. So, if the electromagnetic waves were oscillations of a certain special media (that was 
first called the "luminiferous aether" 6 and later just ether), similar arguments might be applicable to 
reconcile Eqs. (2) and (3). 

The detection of such a medium was the goal of the Michelson-Morley measurements (carried 
out between 1881 and 1887 with better and better precision), that are sometimes called "the most 
famous failed experiment in physics". Figure 2 shows a crude scheme of their experiments. 
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Fig. 9.2. The Michelson- 
Morley experiment. 



3 Discussions in this chapter and most of the next chapter will be restricted to the free-space (and hence 
dispersion- free) case; some media effects on radiation by relativistic particles will be discussed in Sec. 10.4. 

4 It is interesting that the Schrodinger equation, whose fundamental solution for a free particle is a similar 
monochromatic wave (albeit with a different dispersion law), is Galilean-invariant, with a certain addition to the 
wavefunction's phase. 

5 See, e.g., CM Sees. 5.5 and 7.7. 

6 In the ancient Greek mythology, aether is the clear upper air breathed by the gods residing on mount Olympus. 
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A nearly-monochromatic wave is split in two parts (nominally, of equal intensity), using a semi- 
transparent mirror tilted by 45° to the incident wave direction. These two partial waves are reflected 
back by two genuine mirrors, and arrive at the same semi-transparent mirror again. Here a half of each 
wave is returned to the light source area (where they vanish without affecting the source), but another 
half passes toward the detector, forming, with its counterpart, an interference pattern similar to that in 
the Young experiment. Thus each of the interfering waves has traveled twice (back and forth) each of 
two mutually perpendicular "arms" of the interferometer. Assuming that the ether, in which light 
propagates with speed c, moves with speed v< c along one of the arms, of length //, it is straightforward 
(and hence left for reader's exercise :-) to get the following expression for the difference between light 
roundtrip times: 



At = 



(l-v 2 /c 2 )' /2 l-v 2 /c 2 



M 2 



(9.4) 



where l t is the length of the second arm of the interferometer (perpendicular to v), and the last, 
approximate expression is valid at l t « // and v « c. 

Since Earth moves around the Sun with speed ve ~ 30 km/s « 10" c, the arm positions relative to 
this motion alternate, due to Earth rotation about its axis, each 6 hours - see the right panel of Fig. 2. 
Hence if we assume that the ether rests in Sun's reference frame, At (and the corresponding shift of 
interference fringes), has to alternate with this half-period as well. The same alternation may be 
achieved, at a smaller time scale, by a deliberate rotation of the instrument by idl. In the most precise 
version of the Michelson-Morley experiment (1887), this shift was expected to be close to 0.4 of the 
fringe pattern period. The result was negative, with the error bar about 0.01 of the fringe period. 7 

The most prominent immediate explanation of this zero result 8 was suggested in 1889 by G. 
FitzGerald and (independently and more qualitatively) by H. Lorentz in 1892: as evident from Eq. (4), if 
the longitudinal arm of the interferometer itself experiences the so-called length contraction, 



h(v) = h(0) 



f 2 y /2 



(9.5) 



while the transversal arm's length is not affected by the motion relative to the ether, this cancels At. 
This, extremely radical, idea received a strong support from the proof, in 1887-1905, that the Maxwell 
equations, and hence the wave equation (3), are invariant under the so-called Lorentz transform? For the 
choice of coordinates shown in Fig. 1, the transform reads 



7 Through the 20 l century, the Michelson-Morley-type experiments were repeated using more and more refined 
experimental techniques, always with the zero result for the apparent ether motion speed. For example, recent 
experiments, using cryogenically cooled optical resonators, have reduced the upper limit for such speed to just 
3xl0" 15 c -see H. Miiller et al, Phys. Rev. Lett. 91, 020401 (2003). 

8 The zero result of a slightly later experiment, namely precise measurements of the torque which should be 
exerted by the moving ether on a charged capacitor, carried out in 1903 by F. Trouton and H. Noble (following G. 
FitzGerald's suggestion), seconded the Michelson and Morley's conclusions. 

9 The theoretical work toward this goal (which I do not have time to review in detail) included important 
contributions by W. Voigt (in 1887), H. Lorentz (1892 - 1904), J. Larmor (1897 and 1900), and H. Poincare 
(1900 and 1905). 
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x' + vf _ , _ , _ t' + (v/c 2 )x' 

x= (i-v 2 /c 2 y 2 ' y=y ' z=z ' t= {i-v 2 icT 

It is elementary to solve these equations for the primed coordinates to get the reciprocal transform 

X-Vt , , , t-(v/c 2 )x , n n\ 

"^TW y - y ' z - z ' '-^TJTT- <96b) 

(I will soon present Eqs. (6) in a more elegant form.) 

The Lorentz transform relations (6) are evidently reduced to the Galilean transform formulas (2) 

2 2 

at v « c . As will be proved in the next section, Eqs. (6) also yield the length contraction (5). However, 
all attempts to give a reasonable interpretation of these equations while keeping the notion of the ether 
have failed, in particular because of the restrictions imposed by results of earlier experiments carried out 
in 1851 and 1853 by H. Fizeau - that were repeated with higher accuracy by the same Michelson and 
Morley in 1886. These experiments have shown that if one sticks to the ether concept, this hypothetical 
medium should be partially "dragged" by any moving dielectric media with a speed proportional to {s r - 
1). Careful reasoning shows that such local drag is irreconcilable with the assumed continuity of the 
ether. 

In his famous 1905 paper, Albert Einstein has made a bold step, essentially removing the concept 
of the ether altogether. Moreover, he argued that the Lorentz transform is the general property of time 
and space, rather than of the electromagnetic field alone. He has started with two postulates, the first one 
essentially repeating the principle of relativity, formulated earlier (1904) by H. Poincare in the following 
form: 

". . .the laws of physical phenomena should be the same, whether for an observer fixed, or for an 
observer carried along in a uniform movement of translation; so that we have not and could not have 
any means of discerning whether or not we are carried along in such a motion." 10 

The second Einstein's postulate was that the speed of light c, in free space, should be constant in 
all reference frames. (This is essentially a denial of ether's existence.) 

Then, Einstein showed how naturally do the Lorenz transform relations (6) follow from his 
postulates, with a few (very natural) additional assumptions. Let a point source emit a short flash of 
light, at the moment t = t' = 0 when origins of the reference frames shown in Fig. 1 coincide. Then, 
according to the second of Einstein's postulates, in each of the frames the spherical wave propagates 
with the same speed c, i.e. coordinates of points of its front, measured in the two frames, have to obey 
equations 

(ct) 2 -(x 2 + y 2 +z 2 ) = 0, 
(ctf-(x' 2 + y' 2 +z' 2 ) = 0. 

What may be the general relation between the combinations in the left-hand side of these equations - not 
for this selected pair of events, the light flash and its detection, but in general? A very natural 
(essentially, the only justifiable) choice is 



10 Note that though the relativity principle excludes the notion of the special ("absolute") spatial reference frame, 
its verbal formulation still leaves the possibility of the Galilean "absolute time" open. The quantitative relativity 
theory kills this option - see Eqs. (6) and their discussion below. 
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[{ctf -(x 2 + y 2 + z 2 )}= f(v 2 )[(ct'f - (x' 2 + y' 2 + z' 2 )}. 



(9.8) 



Now, according to the first postulate, the same relation should be valid if we swap the reference frames 
(x <-» x\ etc.) 11 and replace v with (-v). This is only possible if f 2 = 1, so that excluding option /= -1 
(which is incompatible with the Galilean transform in the limit v/c — » 0), we get 



(ct) 2 - (x 2 + y 2 + z 2 ) = (cf) 2 - (x' 2 + y' 2 +z' 2 ). 
For the line y = y ' = 0, z= z' = 0, Eq. (9) is reduced to 

(ct) 2 -x 2 =(ct') 2 -x' 2 . 



(9.9) 



(9.10) 



It is very illuminating to interpret this relation as the one resulting from a mutual rotation of the 
reference frames (that now have to include clocks to measure time) on the plane of coordinate x and the 
so-called Euclidian time r = ict - see Fig. 3. 




Fig. 9.3. The Lorentz transform as a mutual 
rotation of reference frames on the [x, r] plane. 



Indeed, rewriting Eq. (10) as 



2,2 ,2 , ,2 
T +X =T +X . 



(9.11) 



we may consider it as the invariance of the squared radius at the rotation that is shown in Fig. 3 and 
described by the evident geometric relations 



with the reciprocal relations 



x = jc'cos^-r'sin^-, 
t = x'sinys + r'cos^, 

x' = xcosy/ + rsiny/-, 
t' = -xsini// + rcosy. 



(9.12a) 



(9.12b) 



So far, angle y/ (frequently called rapidity) has been arbitrary. In the spirit of Eq. (8), a natural 
choice is y/ = yAy), with the requirement ^0) = 0. In order to find this function, let us write the 
definition of velocity v of frame 0', as measured in reference frame 0: for x' = 0, x = vt. In variables x 
and z, this means 



x i 



x 



x'=0 



V 

T~ X '=0 ~ T~ 
T ICt IC 

On the other hand, for the same point x' = 0, Eqs. (12a) yield 



(9.13) 



11 Strictly speaking, at this swap we should also replace v with (-v), but this change does not affect Eq. (8). 
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r 

These two expressions are compatible only if 



-I , =0 =-tan^. (9.14) 



tan^ = — ; (9.15) 

c 



from here 



tan^ ivlc 1 1 

(l + tanVj (l-v 2 /c 2 j (1 + tanVJ (l-v 2 /c 2 ) 

where /? and fare two very convenient and commonly used dimensionless parameters defined as 

(9.17) 



c ' (l-v 2 /c 2 ) 1/2 (l-^ 2 ) 1/: 



Parameters 
p and y 



(Vector p is called the normalized velocity, while scalar y, the Lor entz factor) 

Using the relations for y/ , Eqs. (12) become 

x = y(x'-i/3r'), r = y(ij3x' + z'), (9.18a) 

x' = y(x + ipr), f = y(-i/3x+r) (9.18b) 

Now returning to the real variables [x, ct], we get the Lorentz transform relations (6) in a more compact 
form: 12 

x = y(x' + j3 ct'), y = y', z = z', ct = y(ct' + J3 x'), (9.19a) 
x' = y(x-j3ct), y' = y, z' = z, ct' = y(ct - J3 x). (9.19b) 

2 2 

An immediate corollary of Eqs. (6) is that for y to stay real, we need v < c , i.e. that the speed of 
any physical body (to which we could connect a reference frame) cannot exceed the speed of light, as 
measured in any physically meaningful reference frame. 13 

9.2. Relativistic kinematic effects 

In order to discuss other corollaries of Eqs. (19), we need to spend a few minutes to discuss what 
do these relations actually mean. Evidently, they are trying to tell us that the spatial and temporal 
intervals are not absolute (as they are in the Newtonian space), but do depend on the reference frame 
they are measured in. So, we have to understand very clearly what exactly may be measured - and thus 
may be discussed in a physics theory. Recognizing this necessity, A. Einstein has introduced the notion 
of numerous imaginary observers that may be distributed all over each reference frame. Each observer 
has a clock and may use it to measure the instants of local events. He also conjectured that: 



12 Still, in some cases below, it will be more convenient to use Eqs. (6) rather than Eqs. (19). 

13 All attempts to rationally conjecture particles moving with v> c, called tachyons, have failed (so far, at least :-). 
Possibly the strongest objection against their existence is the notice that tachyons could be used to communicate 
back in time, thus violating the causality principle - see, e.g., G. Benford et al., Phys. Rev. D 2, 263 (1970). 
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(i) all observers within the same reference frame may agree on a common length measure ("a 
scale"), i.e. on their relative positions in that frame, and synchronize their clocks, 14 and 

(ii) observers belonging to different reference frames may agree on the nomenclature of world 
events (e.g., short flashes of light) to which their respective measurements belong. 

Actually, these additional postulates have been already implied in our "derivation" of the 
Lorentz transform in Sec. 1. For example, by {x, y, z, and t] we mean the results of space and time 
measurements of a certain world event, about that all observers belonging to frame 0 agree. Similarly, 
all observers of frame 0' have to agree about results {x \ y',z',t'}. Finally, when the origin of frame 0' 
passes by some sequential points xt of frame 0, observers in that frame may measure its passage times tk 
without a fundamental error, and know that all these times belong to x' = 0. 

Now we can analyze the major corollaries of the Lorentz transform, which are rather striking 
from the point of view of our everyday (rather non-relativistic :-) experience. 

(i) Length contraction . Let us consider a rigid rod, stretched along axis x, with length / = xi - X\, 
where X\^ are the coordinates of rod's ends, as measured in its rest frame 0, at any instant t (Fig. 4). 
What would be the rod's length /' measured by the Einstein observers in the moving frame 0'? 




Fig. 9.4. Relativistic length contraction. 



At a time instant t' agreed upon in advance, the observers who find themselves exactly at the 
rod's ends, may register that fact, and then subtract their coordinates x'ij. to calculate the apparent rod 
length V = X2 — x\ in the moving frame. According to Eq. (19a), I may be expressed via /' as 



l = x 2 -x l = y(x 2 ' + fief) - y(xj + f3cf) = y{x 2 ' - x{) = yV > V . 
Hence, the rod's length, as measured in the moving reference frame is 



Length 
contraction 




(9.20a) 



(9.20b) 



in accordance with the FitzGerald-Lorentz hypothesis (5). This is the relativistic length contraction 
effect: an object is always the longest (has the so-called proper length T) if measured in its rest frame. 
Note that according to Eq. (19), the length contraction takes place only in the direction of the relative 
motion of two reference frames. As has been noted in Sec. 1, this result immediately explains the zero 



14 A posteriori, the Lorenz transform may be used to show that consensus-creating procedures (such as clock 
synchronization) are indeed possible. The basic idea of the proof is that at v « c the relativistic corrections to 
space and time intervals are of the order of (v/c) 2 , they have negligible effects on clocks being brought together 
into the same point for synchronization very slowly, with velocity v « c. The reader interested in detailed 
discussion of this and other fine points of special relativity may be referred to, e.g., either H. Arzelies, Relativistic 
Kinematics, Pergamon, 1966, or W. Rindler, Introduction to Special Relativity, 2 nd ed., Oxford U. Press, 1991. 
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result of the Michelson-Morley-type experiments, so that they give a convincing evidence (if not an 
irrefutable proof) of Eq. (20). 

(ii) Time dilation . Now let us use Eqs. (19a) to find the time interval At, as measured in frame 0, 
between two world events - say, two ticks of a clock moving with frame 0' (Fig. 5), i.e. having constant 
values of x', y', andz'. 




x' 

Fig. 9.5. Relativistic time dilation. 



Let the time interval between these two events, measured in clock's rest frame 0', be At' = h' - 
t\ ' . At these two moments, clock would fly by certain two Einstein's observers at rest in frame 0, so that 
they can record the corresponding moments ?i,2 shown by their clocks, and then calculate At as their 
difference. According to the second of Eqs. (19a), 



At = t 2 -fj = -[(ct 2 ' + (3x')-(ct; + px')] = yAf , (9.21a) 
c 



so that, finally, 



At' 



C9 21M Len9th ■ 
ys.-^ivj contraction 



This is the famous relativistic time dilation (or "dilatation") effect: a time interval is longer if measured 
in a frame (in our case, frame 0) moving relatively to the clock, while that in the rest frame is the 
shortest - the so-called proper time interval. 

This counter-intuitive effect is the everyday reality at experiments with high-energy elementary 
particles. For example, in a typical (by no means record-breaking) experiment carried out in Fermilab, a 
beam of charged 200 GeV pions with yx 1,400 passed distance / = 300 m distance with the measured 
loss of only 3% of the initial beam intensity due to the pion decay (mostly, into muon-neutrino pairs) 
with proper lifetime to « 2.56xl0" 8 s. Without the time dilation, only an exp{-//c£o}~10" 17 part of the 
initial pions would survive, while the relativity-corrected number exp{-//cf} = Qxp{-l/cyt 0 } ~ 0.97 was in 
a full accordance with experimental measurements. As another example, the global positioning system 
(GPS) is designed with the account of the time dilation due to the velocity of its satellites (and also some 
gravity-induced, i.e. general-relativity corrections that I do not have time to discuss) and would give 
large errors without such corrections. So, there is no doubt that time dilation (21) is a reality, though the 
precision of its experimental tests I am aware of has been limited by a few percent, because of almost 
unavoidable involvement of gravity effects. 15 

Before the first reliable observation of the time dilation (by B. Rossi and D. Hall in 1940), there 
have been serious doubts in the reality of this effect, the most famous being the twin paradox first posed 



15 See, e.g., J. Hafele and R. Keating, Science 111, 166 (1972). 
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(together with an immediate suggestion of its resolution) by P. Langevin in 1911. Let us send one of two 
twins on a long star journey with a speed v approaching c. Upon his return to Earth, who of the twins 
would be older? The nai've approach is to say that due to the relativity principle, not one can be (and 
hence there is no time dilation), because each twin could claim that his counterpart, rather than himself, 
was moving, with the same speed v, just in the opposite direction. The resolution of the paradox in the 
general theory of relativity (which can handle gravity and acceleration effects) is that one of the twins 
had to be accelerated to be brought back, and hence the reference frames have to be dissimilar: only one 
of them may stay inertial all the time. Because of that, the twin who had been accelerated ("actually 
traveling") would be younger than his sibling when they meet. 

(iii) Velocity transformation. Now let us calculate velocity u of a particle, as observed in 
reference frame 0, provided that its velocity, as measured in frame 0', is u' (Fig. 6). 



A 



y 



0' 




v 



X 



Fig. 9.6. Relativistic velocity addition. 



Keeping the usual definition of velocity, but with due attention to the relativity of not only 
spatial but also temporal intervals, we may write 



dr , dr' 
u = — , u = — 

dt df 



Plugging in the differentials of the Lorentz transform relations (6a), we get 



dx 



dx' + vdf 



u' + v 



dy 1 



dy' 



1 



dt dt' + vdx'/c 2 l + u' x v/c 2 ' y dt ydt' + vdx'/c 2 yl + u' x v/c 2 
and the similar formula for u z . In the classical limit vie — > 0, these relations are reduced to 

u x =u' x +v, u =u' , u z =u' z , 

and may be merged into the familiar Galilean vector form 

u = u' + v, for v « c . 



(9.21) 



(9.22) 



(9.23) 



(9.24) 



In order to see how strange the full relativistic rules (22) are, let us first consider a purely 
longitudinal motion, u y = u z = 0; then 16 



Longitudinal 
velocity 
addition 



u = 



u +v 
l + u'v/c 



2 ' 



(9.25) 



16 With an account of the well-known trigonometric identity tan(a + b) = (tana + tanZ?)/(l - tana tanb) and Eq. 
(15), Eq. (25) shows that that rapidities if/ add up exactly as longitudinal velocities at non-relativistic motion, 
making that notion very convenient for the analysis transfer between several frames. 
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where u = u x and u ' = u\. Figure 7 shows u as the function of u ', given by this formula, for several 
values of the reference frames' relative velocity v. 



v/c = 0.9 
I/O 


■ 1 




-o.y J 

-0.9/ 



Fig. 9.7. Longitudinal velocity addition. 



- 1 



o 

u' I c 



The first sanity check is that if v = 0, i.e. the reference frames are at rest relative to each other, 
then u = u', as it should be - see the diagonal straight line. Next, if magnitudes of u ' and v are both 
below c, so is the magnitude of u . (Also good, because otherwise ordinary particles in one frame would 
be tachyons in the other one, and the theory would be in a big trouble.) Now strange things start: even as 
u ' and v are both approaching c, then u is also close to c, but does not exceed it. As an example, if we 
fired ahead a bullet with speed 0.9c from a spaceship moving from the Earth also at 0.9c, Eq. (25) 
predicts the speed of the bullet relative to Earth to be just [(0.9 + 0.9)/(l + 0.9x0.9)]c * 0.994c < c, 
rather than (0.9 + 0.9)c = 1.8 c > c as in the Galilean kinematics. We certainly should accept this 
strangeness of relativity, because it is necessary to fulfill the 2 nd Einstein's postulate: the independence 
of the speed of light in any reference frame. Indeed, for u' = ±c, Eq. (25) yields u = +c, regardless of v. 

In the opposite case of transversal motion, when a particle moves across the relative motion of 
the frames (for example, at our choice of coordinates, u' x = u' z = 0), Eqs. (22) yield a less spectacular 
result 



U y 

' ' f 

u „ = < u 

r 



y 



(9.26) 



This effect comes purely from the time dilation, because the transversal coordinates are Lorentz- 
invariant. 

In the case when both u x ' and u y ' are substantial (but u z ' is still zero), we may divide expressions 
(22) by each other to relate angles 6*of particle propagation, as observed in the two reference frames: 



tan 6 



u. 



y(u\+v) y(cos0' + v/u') 



Stellar 
(9.27) aberration 



effect 



This expression describes, in particular, the so-called stellar aberration effect, the dependence of the 
observed direction 6 toward a star on the speed v of the telescope motion relative to the star - see Fig. 
8. (The effect is readily observable experimentally as the annual aberration due to the periodic change 
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of speed v by 2ve~ 60 km/s because of Earth's rotation about the Sun. Since the aberration's main part 
is of the first order in ve/c ~ 10" 4 , the effect is very significant and has been known since the early 
1700s.) 




Fig. 9.8. Stellar aberration. 



For the analysis of this effect, it is sufficient to take, in Eq. (27), u' = c, i.e. vlu' = /?, and 
interpret 0' as the "proper" direction to the star, that would be measured at v = 0. 17 At f3« 1, both Eq. 
(27) and the Galilean result (which the reader is invited to derive directly from Fig. 8), 

tan#= Sm6? ' , (9.28) 
cos 0' + J3 

may be well approximated by the first-order term 

A# = sin#. (9.29) 

Unfortunately, it is not easy to use the difference between Eqs. (28) and (29), in of the second order in /?, 
for the special relativity confirmation, because other components of Earth's motion, such as its rotation, 
nutation and torque-induced precession, 18 give masking first-order contributions to the aberration. 

Finally, at a completely arbitrary direction of vector u', Eqs. (22) may be readily used to 
calculate the velocity magnitude. The most popular form of the resulting expression is for the square of 
relative velocity (or rather relative reduced velocity P) of two particles, 

^2 = (P 1 -P 2 ) 2 -|P 1 -P 2 | ^ L (93()) 

(l-PrP 2 ) 

where Pi,2 = are their normalized velocities as measured in the same reference frame. 

(iv) The Doppler effect . Now let us consider a plane, monochromatic wave moving along axis x: 
f = R e[A exp{i(ta - cot}] = \f a I cos(fc* - cot + arg f a ) . (9.31) 



17 Strictly speaking, in order to reconcile the geometries shown in Fig. 1 (for which all our formulas, including 
Eq. (27), are valid) and Fig. 8 (giving the traditional scheme of the aberration), it is necessary to invert signs of u 
(and hence sin#' and cosd?') and v, but as evident from Eq. (27), all the minus signs cancel, and the formula is 
valid as is. 

18 See, e.g., CM Sees. 6.4-6.5. 
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Its total phase, *F = kx -cot + &rgf w (in contrast to amplitude \f m \ !) cannot depend on the observer's 
reference frame, because all fields of a traveling wave vanish simultaneously at *F = 2%n, (for all integer 
n) and such "world events" should take place in all reference frames. The only way to keep ¥ = V F' at all 
times is to have 19 

kx - cot = k'x' - co'f . (9.32) 

First, let us consider the Doppler effect describing usual nonrelativistic waves, e.g., oscillations 
of particles of a certain medium. Using the Galilean transform (2), we may rewrite Eq. (32) as 

k(x' + vt)-cot = k'x'-(o't. (9.33) 

Since this transform leaves all space intervals (including wavelength A = 2nlk) intact, we can take k = k', 
so that Eq. (33) yields 

co' = co-kv. (9.34) 

For a dispersion-free medium, the wave number k is the ratio of its frequency, as measured in the 
reference frame bound to the medium, and the wave velocity v w . In particular, if the wave source rests 
in the medium, we can bind frame 0 to the medium as well, and frame 0' to wave's receiver (so that v = 
v r ), so that 

k = — , (9.35) 

and for the frequency perceived by the receiver, Eq. (34) yields 

<D' = G) Vw ~ Vr . (9.36) 

On the other hand, if the receiver and the medium are at rest in reference frame 0', while the wave 
source is bound to frame 0 (so that v = -v s ), Eq. (35) should be replaced with 

k=k' = —, (9.37) 

and Eq. (34) yields a different result: 

co' = co Vw , (9.38) 

Finally, if both the source and detector are moving, it is straightforward to combine these two results to 
get the general relation 

oo' = co Vw ~ Vr . (9.39) 



At low speeds of both the source and receiver, this result simplifies, 



19 Strictly speaking, Eq. (32) is valid to an additive constant, but for notation simplicity, it may be always made 
equal to zero by selecting (at it has already been done in all relations of Sec. 1) the reference frame origins and/or 
clock turn-on times so that at t = 0 and x = 0, t' = 0 and x' = 0 as well. 
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CO 



(9.40) 



but at speeds comparable to v w we have to use the more general Eq. (39). Thus, the usual Doppler effect 
is affected not only by the relative speed (v r — v s ) of wave's source and detector, but also of their speeds 
relative to the medium in which waves propagate. 

Somewhat counter-intuitively, for the electromagnetic waves the calculations are simpler, 
because for them the propagation medium (ether) does not exist, wave velocity equals ±c in any 
reference frame, and there are no two separate cases: we can always take k = ±colc, k' = ±co'/c. Plugging 
these relations, together with the Lorentz transform (19a), into the phase-invariance equation (32), we 
get 

CO , , „ .„ ct' + fix' 

c c 



+ — y(x' + fict')- 
c 



coy- 



±** 



cor 



(9.41) 



This relation has to hold for any x' and t', so we may require the net coefficients before these variables 
to vanish. These two requirements yield the same condition: 



CO 



coy{\ + (3). 

This result is already quite simple, but may be transformed further to be even more illuminating: 



co =co 



CO 



(1 +/*-/?). 



1/2 



(9.42) 



(9.43) 



At any sign before /?, one pair of parentheses cancel, so that 



Longitudinal 
Doppler 
effect 



CO =co 



1±/?. 



(9.44) 



(It may look like the reciprocal expression of co via co' is different, violating the relativity principle. 
However, in this case we have to change the sign of /?, because the relative velocity of the system is 
opposite, so we come down to Eq. (44) again.) 

Thus the Doppler effect for electromagnetic waves depends only on the relative velocity v = fie 
between the wave source and detector - as it should be, given the absence of the ether. At velocities 
much below c, Eq. (43) may be crudely approximated as 



co « co — - — ~co{\+ B). 
\±PI2 



(9.45) 



i.e. in the first approximation in /3 = vie it coincides with the corresponding limit (38) of the usual 
Doppler effect. However, even at v « c there is still a difference of the order of (v/c) 2 between the 
Galilean and Lorentzian relations. 

If the wave vector k is tilted by angle 6 to vector v (as measured in frame 0), then we have to 
repeat the calculations, with k replaced by k x , and components k y and k z left intact at the Lorentz 
transform. As a result, Eq. (42) is generalized as 

co' = coy(l- J3 cos 0). (9.46) 
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For the cases cos0 = ±1, Eq. (44) reduces to our previous result. However, at 0= nil (i.e. cos6> = 0), the 
relation is rather different: 

Transverse 
(9.47) Doppler 
effect 



This is the transverse Doppler effect - which is completely absent in the nonrelativistic physics. 
Its first experimental evidence was obtained using electron beams (as suggested in 1906 by J. Stark), by 
H. Ives and G. Stilwell in 1938 and 1941. Later, similar experiments were repeated several times, but 
the first unambiguous measurement were performed only in 1979 by D. Hasselkamp et al. who 
confirmed Eq. (47) with a relative accuracy about 10%. This precision may not look too spectacular, but 
besides the special tests discussed above, the Lorentz transform formulas have been also confirmed, less 
directly, by a huge body of other experimental data, especially in high energy physics, being in 
agreement with calculations incorporating the transform as their part. This is why, with every respect to 
the challenging authority spirit, I should warn the reader: you decide to challenge the relativity theory 
(that is called "theory" by tradition only), you would also need to explain all these data. 20 Best luck with 
that! 




9.3. 4-vectors, momentum, mass, and energy 



Before proceeding to relativistic dynamics, let us discuss a mathematical language that makes all 
the calculations more compact - and more beautiful. We have already seen that spatial coordinates {x, y, 
z} and product ct are Lorentz-transformed similarly - see Eqs. (19). So it is natural to consider them as 
components of a 4-component vector (or, for short, 4-vector), 



{xq ,X{,X2,x^} — \ct, r 



with components 



According to Eqs. (19), its components are Lorentz-transformed as 




(9.48) 
(9.49) 

(9.50) 



where L;y are the elements of the 4x4 Lorentz transform matrix 




Space 

-time 

4-vecttor 



4-form of 

Lorentz 

Transform 



(9.51) 



Since 4-vectors are a new notion for our course, and are used for much more goals than the just 
the space-time transform, we need to discuss the mathematical rules they obey. Indeed, as was 



Lorentz 

transform 

matrix 



20 The same fact, ignored by crackpots, is also valid for other favorite points of their attacks, including the 
Universe expansion and quantum mechanics and in physics, and the evolution theory in biology. 
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mentioned in Sec. 8.9, the usual (3D) vector is not just any ordered set (string) of three scalars {A x , A y , 
A z }; if we want it to represent a reference-frame-independent physical variable, vector components have 
to obey certain rules at transfer from one reference frame to another. In particular, vector's 3D norm 
(magnitude squared), 



A =At +a: + A 



(9.52) 



should be an invariant at the Galilean transform (2). However, a naive extension of this formula to 4- 
vectors would not work, because, according to the calculations of Sec. 1, the Lorentz transform keeps 
intact combinations of the type (7), with one sign negative, rather than the sum of all components 
squared. Hence for the 4-vector all the rules of the game have to be reviewed and adjusted - or rather 
redefined from the very beginning. 

Arbitrary 4-vector is a string of 4 scalars, 



General 
4-vector 



General 
4-vector's 

Lorentz 
transform 



Lorentz 
invariance 



{^0 >A'^2'^3 



(9.53) 



defined in 4D Minkowski space, 21 whose components Aj, as measured in systems 0 and 0 ', shown in Fig. 
1, obey the Lorentz transform similar to Eq. (50): 



(9.54) 



As we have already seen on the example of the space-time 4-vector (48), this means in particular that 




(9.55) 



This is the so-called Lorentz invariance condition of the norm of the 4-vector. (The difference 
between this relation and Eq. (52), pertaining to the Euclidian geometry, is the reason why the 
Minkowski space is called pseudo-Euclidian.) It is also straightforward to use Eqs. (51) and (54) to 
check that an evident generalization of the norm, the scalar product of two arbitrary 4-vectors, 



Scalar 
4-product 



A 0 B 0 -Y J A J B J 



7=1 



(9.56) 



is also Lorentz-invariant. 

Now consider the 4-vector corresponding to a infinitesimal interval between two close world 



events: 



its norm, 



{dx 0 , dx l , dx 2 , dx 3 } = {cdt, dr] ; 



Interval 
between 
two close 
events 



(ds) 2 = dxl ~ Yj dx ) = c 1 i dt f ~ ( dr ) 2 , 
;'=i 



(9.57) 



(9.58) 



21 After H. Minkowski who was first to recast (in 1907) the special relativity relations in a form in which space 
coordinates and time (or rather ct) are treated on an equal footing. 



Chapter 9 



Page 15 of 52 



Essential Graduate Physics 



EM: Classical Electrodynamics 



is of course also Lorentz-invariant. Since the speed of any particle (or signal) cannot be larger than c, for 
any pair of world events that are in a causal relation with each other, dr cannot be larger than cdt, i.e. 
such time-like interval (ds) cannot be negative. The 4D surface separating such intervals from space- 

2 

like intervals (ds) < 0 is called the light cone (Fig. 9). 



time-like interval ds 2 >0 
(causal relation possible) 



space-like interval ds 2 <0 
(causal relation impossible) 




Fig. 9.9. 2+1 dimensional image of 
the light cone (which is actually 3+1 
dimensional). 



Now let us assume that these two close world events happen with the same particle that moves 
with velocity u. Then in the frame moving with a particle (v = u), the last term in the right-hand part of 
Eq. (58) equals zero, so that 

ds = cdr , (9.59) 
where dr is the proper time interval. But according to Eq. (21), this means that we can write 

dt 



dr = 

r 

where dt is the time interval in an arbitrary (besides being inertial) reference frame, while 

1 



P = — and y = -, ^—^jj = ~r 



1/2 



(9.60) 



(9.61) 



are the parameters (17) corresponding to particle's velocity (u) in that frame, so that ds = cdtly. 



22 



Now, let us explore whether a 4-vector can be formed using spatial components of particle's 



velocity 



dx dy dz 
dt' dt' dt 



(9.62) 



Here we have a slight problem: as Eqs. (22) show, these components do not obey the Lorentz transform. 
However, let us use dr = dt/ythe proper time interval of the particle, to form the following string: 



22 I have opted against using special indices (e.g., P„, y u ) to distinguish Eqs. (17) and (61) here and below, in a 
hope that the suitable velocity (of a reference frame or of a particle) will be always clear from the context. 
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4-velocity 



dr dx dx dr 



dx dy dz 
dt ' dt 



.§}-rM- 



(9.63) 



Free 
particle's 
action 



As follows from comparison of the first form of this expression with Eq. (48), since the time-space 
vector obeys the Lorentz transform, and r is Lorentz-invariant, string (63) is a legitimate 4-vector. It is 
called the 4-velocity of the particle. 

Now we are properly equipped to proceed to dynamics. Let us start with such basic notions of 
momentum p and energy 3 - so far, for a free particle. 23 Perhaps the most elegant way to "derive" (or 
rather guess 24 ) expressions for p and 3 as functions of particle's velocity u is based on analytical 

mechanics. Due to the conservation of v, the trajectory of a free particle in the 4D Minkowski space is 
always a straight line. Hence, from the Hamilton principle of minimum action, 25 we may expect its 
action S, between points 1 and 2, to be a linear function of the space-time interval (59): 

(9.64) 

where a is some constant. On the other hand, in analytical mechanics the action is defined as 

h 

S = \Mt, (9.65) 




where X is particle's Lagrangian function. 26 Comparing these two expressions, we get 



. ac 

£~ = — = ac 

r 



f 2Y /2 

V c J 



In the nonrelativistic limit (u «c), this function tends to 



/ « ac 



.2 A 



v 2C y 



ac 



au 



(9.66) 



(9.67) 



In order to correspond to the Newtonian mechanics, the last (velocity-dependent) term should equal 

2 

mu 12. From here we find a = -mc, so that, finally, 



Free 
particle's 
Lagrangian 
function 



/ = —mc' 



f 2 A 
V c J 



1/2 



(9.68) 



23 I am sorry for using, as in Sec. 6.3, for particle's momentum, the same traditional notation (p) as had been used 
for the dipole electric moment. However, since the latter notion will be virtually unused in the balance of the 
notes, this may hardly lead to confusion. 

24 Indeed, such a derivation uses additional assumptions, however natural (such as the Lorentz-invariance of 5), 
so it can hardly be considered as a real proof of the final results, so that they require experimental confirmation. 
Fortunately, such confirmations have been numerous - see below. 

25 See, e.g., CM Sec. 10.3. 

26 See, e.g., CM Sec. 2.1. 
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Now we can find Cartesian components pj of particle's momentum, as the generalized momenta 
corresponding to components rj (j = 1, 2, 3) of the 3D radius-vector r: 27 



5/5/ 2 d 

p = — — = = -mc — 



dfj duj 



f 2 2 2 \ 112 

u l +u 2 +u 2 



j v 



mu ; 



L 2 i 2V/2 
[l-u Ic ) 



■ m/Uj . 



(9.69) 



Thus for the 3D vector of momentum, we can write the result in the same form as in nonrelativistic 
mechanics, 



p = my u = Mu 



if we introduce the reference-frame-dependent scalar M (called the relativistic mass) defined as 




(9.70) 



(9.71) 



Relativistic 
momentum 



Relativistic 
mass 



m being the non-relativistic mass of the particle. (It is also called the rest mass, because in the reference 
frame in that the particle rests, Eq. (71) yields M = m.) 

Next, let us return to analytical mechanics to calculate particle's energy 3 (which for a free 
particle coincides with the Hamiltonian function ft): 



27 



mu 



( 2 A 



<? = #=I>A-/=p-U-/=7 , 

j=i \l-u Ic ) 



1/2 



+ mc' 



mc 



(-, 2 / 2V /2 

\l-u I c ) 



(9.72) 



Thus, we have arrived at the most famous of Einstein's formulas (and probably the most famous formula 
of physics as a whole), 



£ = myc 2 =Mc 2 , 



(9.73) &=m& 



that expresses the relation between particle's mass and energy. 28 In the nonrelativistic limit, it reduces to 



6 = 



mc 



V/2 



; mc 



f ..2 A 
1 + - 



2c 2 



(\-u 2 /c 2 ) 

the first term mc 2 being called the rest energy of a particle. 
Now let us consider the following string of 4 scalars: 



= mc +■ 



mu 



(9.74) 



3 



4-vector of 
(9.75) energy- 



momentum 



Using Eqs. (70) and (73) to present this expression as 



27 See, e.g., CM Sec. 2.3. 

28 Let me hope that the reader understands that all the layman talk about the "mass to energy conversion" is only 
valid in a very limited sense of the word. While the Einstein relation (73) does allow the conversion of "massive" 
particles (with m * 0) into massless particles such as photons, each of the latter particles also has a nonvanishing 
relativistic mass M, and simultaneously the energy related to Mby Eq. (73). 
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(9.76) 

and comparing the result with Eq. (63), we immediately see that, since m is Lorentz-invariant, this string 
is a legitimate 4-vector of energy -momentum. As a result, its norm, 



3 



(9.77) 



is Lorentz-invariant, and in particular has to be equal to the norm in the particle rest frame. But in that 

2 

frame, p = 0, and 3 = mc , so that in an arbitrary frame 



3 

\c J 



2 / \2 

p = (mc) 



(9.78a) 



This very important relation 29 between the relativistic energy and momentum (valid for free particles 
only!) is usually presented in the form 30 



Free 
particle's 
energy 



3 2 =(mc 2 ) 2 +(pc) 2 . 



(9.78b) 



According to Eq. (70), in the ultrarelativistic limit u — » c, p tends to infinity while mc stays 
constant, so that pc » mc 2 . As follows from Eq. (78), in this limit 3 » pc. Though the above discussion 

was for particles with finite m, the 4-vector formalism allows us to consider particles with zero rest mass 
as ultrarelativistic particles for which the above energy-to-moment relation, 

3 = pc, form = 0, (9.79) 

is exact. Quantum electrodynamics 31 tells us that under certain conditions, electromagnetic field quanta 
(photons) may be also considered as such massless particles, with momentum p = hk. Plugging (the 
modulus of) the last relation into Eq. (78), for photon's energy we get 3 = pc = hkc = ha>. Please note 
that according to Eq. (73), the relativistic mass of a photon is finite: M = 31c 2 = Tica/c , so that the term 
"massless particle" has a limited meaning: m = 0. For example, the mass of an optical phonon is of the 
order of 10" 6 kg. This is not too much, but still a noticeable (approximately one-millionth) part of the 
rest mass m e of an electron. 

The fundamental relations (70) and (73) have been repeatedly verified in numerous particle 
collision experiments, in which the total energy and momentum of a system of particles are conserved - 
at the same conditions at in the non-relativistic dynamics. (For momentum, this is the absence of 
external forces, and for energy, the elasticity of particle interactions - in other words, the absence of 
alternative channels of energy escape.) Of course, generally, the total energy of the system is conserved, 
including the potential energy of particle interactions. However, at typical particle collisions, the 



29 Please note one more useful relation following from Eqs. (70) and (73): p =(E/c 2 )u. 

30 It may be tempting to interpret this relation as the perpendicular-vector-like addition of the rest energy mc 2 and 
the "kinetic energy" pc, but from the point of view of the total energy conservation (see below), a better definition 
of the kinetic energy is T(u) = 3(u) - 3(0). 

31 Briefly reviewed in QM Chapter 9. 
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potential energy vanishes so rapidly with the distance between them that we can use the momentum and 
energy conservation using Eq. (73). 

As an example, let us calculate the minimum energy <£ m in of a proton (p a ), necessary for the 

well-known high-energy reaction that generates a new proton-antiproton pair, p a + pi,— >p + p + p+ p, 

provided that before the collision, proton pb has been at rest in the lab frame. This minimum evidently 
corresponds to the vanishing relative velocity of the reaction products, i.e. their motion with virtually 
the same velocity (uf m ), as seen from the lab frame - see Fig. 10. 




lab frame 



c.o.m. frame 



'u Fig. 9.10. High-energy proton 

reaction at & « <? min - schematically. 



Due to the momentum conservation, this velocity should have the same direction as the initial 
velocity (u min ) of proton p a . This is why two scalar equations: of for the energy conservation, 



mc 2 , 4mc 



+ mc= -, -ttjt , (9.80a) 



and momentum conservation, 



""' + 0 = / 4m "" , (9.80b) 



(i-Wmm IC 2 )' 2 (l-M^/c 2 ) 1 ' 2 

are sufficient to find both u m in and Uf m . After a conceptually simple but rather tedious solution of this 
system of two nonlinear equations, we get 

Wmin=— C, U fm =—C. (9.81) 

Finally, we can use Eq. (73) to calculate the required energy: <^ m i n = 7 mc 2 . (Note that of the acceleration 

2 2 

energy 6mc , only 2mc go into the "useful" proton-antiproton pair production.) Proton's rest mass, m p « 
1.67xl0" 27 kg, corresponds to the rest energy mc 2 « 1.502xl0" 10 J * 0.938 GeV, so that <? min * 6.57 GeV. 

The second, more intelligent way to solve the same problem is to use the center-of-mass (c.o.m.) 
reference frame that, in relativity, is defined as the frame in which the total momentum of the system 
vanishes. 32 In this frame, at 3 = <£ m ; n , the velocity and momenta of all reaction products are equal to 
zero, while velocities of protons p a and pi, before the collision are equal and opposite, with some 
magnitude u' . Hence the energy conservation law becomes 

2 ^ = W, (9.82) 
\l-u I C ) 



32 Note that according to this definition, the c.o.m. 's radius- vector is R = lLMk?k^iMk = 'Zk7kmkri/Z k y k mi c , generally 
different from the well-known expression T, k m k r k fZ k m k of the nonrelativistic mechanics. 
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readily giving u' = (\3/2) c. This is of course the same result as Eq. (81) gives for Uf m . Now we can use 
the fact that the velocity of proton p& in the c.o.m. frame is (-u'), and hence the speed of proton p„ is 
(+u '). Hence we may find the lab-frame speed of proton p a using the velocity transform formula (25): 

u mm = - 2U ' 2 ■ (9.83) 
l + u lc 

This relation gives the same result as the first method, u m i n = (4a/3/7)c, but in a much simpler way. 



9.4. More on 4-vectors and 4-tensors 



This is a good moment to describe a formalism that will allow us, in particular, to solve the same 
proton collision problem in one more (and arguably, the most elegant) way. More importantly, this 
formalism will be virtually necessary for the description of the Lorentz transform of the electromagnetic 
field, and its interaction with relativistic particles - otherwise the formulas would be too cumbersome. 
Let us call the 4-vectors we have used before, 



Contravariant 
and 



A a ={A 0 ,A], 



covariant contravariant, and denote them with the top index, and introduce also covariant vectors, 

4-vectors 



*()' 



(9.84) 



(9.85) 



marked by the lower index. Now if we form a scalar product of these vectors using the standard (3D- 
like) rule, just as a sum of the products of the corresponding components, we immediately get 



A a A a =A a A a =Al-A 2 . 
Here and below the sign of sum of four components of the product has been dropped. 33 



(9.86) 



The scalar product (86) is just the norm of the 4-vector in our former definition, and as we 
already know, is Lorentz-invariant. Moreover, the scalar product of two different vectors (also a Lorentz 
invariant), may be written in any of two similar forms: 34 



Scalar 
product 
forms 



a 0 b 0 



A-B = AB" =A a B n 



(9.87) 



again, the only caveat is to take one vector in the covariant, and another in the contravariant form. 

Now let us return to our sample problem (Fig. 10). Since all components {31c and p) of the total 
4-momentum of our frame are conserved at the collision, its norm is conserved as well: 

(Pa + P„ I (Pa +P b f= (4p) a &p) a • (9-88) 

Since now the vector product is the usual math construct, we know that the parentheses in the left-hand 
part of this equation may be multiplied as usual. We may also swap the operands and move constant 
factors around as convenient. As a result, we get 



33 This compact notation may take some time to be accustomed to, but can hardly lead to any confusion, due to 
the following rule: the summation is implied always (and only) when an index is repeated twice, once on the top 
and another at the bottom. In these notes, this shorthand notation will be used only for 4-vectors, but not for the 
usual (spatial) vectors. 

34 Note also that, by definition, for any two 4-vectors, A a B a = B a A a . 
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(p a i (p a r + (p b i (p b r + i(p a i (p b r=^p a p a 



(9.89) 



Thanks to the Lorentz-invariance of each of the terms, we can calculate each of them in the 
reference frame we like. For the first two terms in left-hand part, as well as for the right-hand part term, 
it is beneficial to use the frames in which that particular proton is at rest; as a result, each of the left- 

2 2 

hand part terms equals (mc) , while the right-hand part equals 16(mc) . On the contrary, the last term of 
the left-hand part is better evaluated in the lab frame, because in that frame the three spatial components 
of the 4-momentum pb vanish, and the scalar product is the just the product of scalars 31c for protons a 
and b. For the latter proton this is just mc, so that we get a simple equation, 



(mcf +(mc) 2 + 2^ L mc = \6(mc) 2 , 
c 

2 

immediately giving the final result: 3 m [ n =1 mc we have already had. 



(9.90) 



Let me hope that this example was a convincing demonstration of the convenience of presenting 
4-vectors in the contravariant (84) and covariant (85) forms, 35 with Lorentz-invariant norms (86). To be 
useful for more complex tasks, the formalism should be developed a little bit further. In particular, it is 
crucial to know how do the 4-vectors change under the Lorentz transform. For contravariant vectors, we 
already know the answer (54), but let us rewrite it in the new notation: 



A a = L a R A 



(9.91) 



where L a „ is the mixed Lorentz tensor (5 1): 36 



j a 



r 


fir 


p r 


r 


0 


0 


0 


0 



0 0 

1 0 
0 1 



(9.92) 



Lorentz 
transform of 
contravariant 
vectors 



Mixed 

Lorentz 

tensor 



Note that though the position of indices a and f3 in the Lorentz tensor notation is not crucial, because it 
is symmetric, it is convenient to place them using the general index balance rule: the difference of the 
numbers of the upper and lower indices should be the same in both parts of any 4-vector/tensor equality, 
with the top index in the denominator of a fraction counted as a bottom index in the nominator, and vice 
versa. (Check yourself that all our formulas above do satisfy this rule.) 

In order to rewrite Eq. (91) in a more general form that would not depend on the particular 
orientation of the coordinate axes (Fig. 1), let us use the contravariant and covariant forms of the 4- 
vector of the time-space interval (57), 



35 These forms are 4-vector extensions of the notions of contravariance and covariance introduced in the 1850s by 
J. Sylvester for the description of 3D vector change at transfer between different reference frames - e.g., axes 
rotation - cf. Fig. 3. For that application, the contravariance or covariance of a vector is determined by its nature: 
if Cartesian coordinates of a vector (such as the nonrelativistic velocity v = drldt) are transformed similarly to the 
radius- vector r, it is called contravariant, while other vectors (such as df/dr = V/ ) that require the reciprocal 
transform, are called covariant. In the Minkowski space, both forms are used for each 4-vector. 

36 Just as 4-vectors, 4-tensors with two top indices are called contravariant, and those with two bottom indices, 
covariant. Tensors with one top and one bottom index are called mixed. 
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dx a = \cdt,dv\ dx a = {cdt,-dr}; 
then its norm (58) may be presented as 37 

(ds) 2 = (cdt) 2 -(dr) 2 = dx a dx a = dx a dx a . 
Applying Eq. (91) to the contravariant form of vector (93), we get 



(9.93) 
(9.94) 
(9.95) 



But, with our new shorthand notation, we can also write the usual rule of differentiation of each 
component x a , considering it as a (in our case, linear) function of 4 arguments x'^ , as follows: 



dx a 

dx a = 7rdx' p . 



dx 



1 P 



(9.96) 



General 
form 
of Lorentz 
transform 



Comparing Eqs. (95) and (96), we can rewrite the general Lorentz transform rule (92) in the new form, 



(9.97a) 




which evidently does not depend on the coordinate axes orientation. 

It is straightforward to verify that the reciprocal transform may be presented as 



Reciprocal 
Lorentz 
transform 



A' a =^rA p . 
dx p 



(9.97b) 



However, the reciprocal transform differs from the direct one only by the sign of the relative velocity of 
the frames, so that the transform is given by the inverse matrix dx' a /dx^; for the coordinate choice shown 
in Fig. 1, the matrix is 

dx ,a _ -fir r 0 0 
dx p ~ o oio 

,0 0 0 1 



(9.98) 



37 Another way to write this relation is (ds) 2 = g a p dx a dx p 


= & 


,a ^dx a dxp, where double summation over indices 


and fi is implied, and g is the so-called metric tensor, 










0 


0 0 ^ 






0 


-1 


0 0 




S — & a/3 ~ 


0 


0 


-1 0 






v0 


0 


0 -\j 





that may be used, in particular, to a transfer a covariant vector into the corresponding contravariant one 
and back: A a = g a ^Ap, A a = g a pA^. The metric tensor plays a key role in general relativity, in which it is 
affected by gravity - "curved" by particle masses. 
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Since, according to Eqs. (84)-(85), covariant 4-vectors differ from the contravariant ones by the sign of 
the spatial components, their direct transform is given by matrix (98). Hence their direct and reciprocal 
transforms may be represented, respectively, as 




(9.99) 



Lorentz 
transform of 
covariant 
vectors 



evidently satisfying the index balance rule. (Note that primed quantities are now multiplied, rather than 
divided as in the contravariant case.) As a sanity check, let us apply this formalism to the scalar product 
AcA" ■ As Eq. (96) shows, the implicit summation notation allows us to multiply and divide any equality 
by the same partial differential of a coordinate, so that we can write: 



(9.100) 



dx' p dx" dx' p 
= ox ox a' g A' r =— — A' 0 A' r = 8 e A' nA' Y = A' A' r , 

dx a dx' 7 p dx" p Pr p 7 

i.e. the scalar product AcA" (as well as A a A a ) is Lorentz-invariant, as it should be. 

Now, let us consider the 4-vectors of derivatives. Here we should be very careful. Consider, for 
example, the following vector operator 



dx" \d(ct) 



(9.101) 



As was discussed above, the operator is not changed by its multiplication and division by another 
differential, e.g., <9x'^(with the corresponding implied summation over ft), so that 



dx c 



dx" dx' 



P 



(9.102) 



But, according to the first of Eqs. (99), this is exactly how the covariant vectors are Lorentz- 
transformed! Hence, we have to consider the derivative over a contravariant space-time interval as a 
covariant 4-vector, and vice versa. 38 (This result might be also expected from the index balance rule.) In 
particular, this means that the scalar product 



dx" 



-A" = 



d\ 
d(ct) 



■ + V-A 



(9.103) 



should be Lorentz-invariant for any legitimate 4-vector. A convenient shorthand for the covariant 
derivative, which complies with the index balance rule, is 

(9.104) 




Shorthand 

so that the invariant scalar product may be written just as daA". A similar definition of the contravariant f ° r , . . 

L J 4-denvatives 

derivative, 



(9.105) 




As was mentioned above, this is also a property of the "usual" transform of 3D vectors. 
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allows us to write the Lorentz-invariant scalar product (103) in any of two forms: 

' " ■ + V-A = a Qr A„ =8A a . 



d(ct) 



a a 



(9.106) 



Finally, let us see how does the general Lorentz transform changes 4-tensors. A second-rank 4x4 
matrix is a legitimate 4-tensor if both 4-vectors it relates obey the Lorentz transform. For example, if 
two legitimate 4-vectors are related as 



we should require that 



(9.107) 
(9.108) 



where A and A ' are related by Eqs. (97), while Bp and B ' p, by Eqs. (99). This requirement immediately 
yields 



Lorentz 
transform 
of 4-tensors 



rpCtfi 


dx" 


VX rp, yS 


r>r' a dr' 13 




dx ir 


dx' s ' 


dx r dx s 



(9.109) 



with the implied summation over two indices, y and 8. The rules for covariant and mixed tensors are 
similar. 39 



9.5. Maxwell equations in the 4-form 

This 4-vector formalism background is already sufficient to analyze the Lorentz transform of the 
electromagnetic field. Just to warm up, let us consider the continuity equation (4.5), 



dp 
~dt 



+ V-j = 0 



(9.110) 



which expresses the electric charge conservation, and, as we already know, is compatible with the 
Maxwell equations. If we now define the contravariant and covariant 4-vectors of electric current as 



4-vector 
of electric 
current 



Continuity 
equation 
in 4-form 



{pc\ }, 



Ja = \PC-ih 



then Eq. (110) may be presented in the form 



d a j a =dj a =0 ; 



(9.111) 



(9.112) 



showing that the continuity equation is invariant with respect to the Lorentz transform. 40 

Of course, the equation invariance does not mean that all component values of the 4-vectors 
participating in the equation are the same in both frames! For example, let us have some static charge 
density p in frame 0; then Eq. (97b), applied to the contravariant form of 4-vector (11 1), reads 



39 It is straightforward to check that transfer between the contravariant and covariant forms of the same tensor 
may be readily achieved using the same metric tensor g: T a p = g a yV s g 5 p, T"^ = g ay T y gg S13 . 

40 In some older texts, the equations preserving their form at the Lorentz transform are called "covariant", creating 
a possibility for confusion. 
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r=Zrj", j" = {pc, 0,0,0}. 



dx 



(9.113) 



Using the explicit form (92) of the reciprocal Lorentz matrix for the coordinate choice shown in Fig. 1, 
we see that this relation yields 



p' = rp> fx = -rPpc = -yvp, j\, = j\ = o 



(9.114) 



Since the charge velocity, as observed from frame 0', is (-v), the non-relativistic result would be j = -\p. 
The additional y factor in the relativistic results for both charge density and current is caused by the 
length contraction: dx' = dx/y so that in order to keep the total charge dQ = pd 3 r = pdxdydz inside the 
elementary volume dr = dxdydz intact, p (and hence j x ) should increase proportionally. 

Next, in the end of Chapter 6 we have seen that Maxwell equations for potentials <fi and A may 
be presented in a similar form (6.109), under the Lorenz (again, not "Lorentz" please!) gauge condition 
(6.108). For the free space, this condition takes the form 



c 2 dt 

This expression gives us a hint how to form the 4-vector of potentials: 41 



H£4 A -= x-A 



indeed, these vectors satisfy Eq. (1 15) in its 4-vector form: 



d a A =ba c 



0. 



(9.115) 



(9.116) 



(9.117) 



4-vector 
of potentials 



Lorenz 
gauge in 
4-form 



Since this scalar product is Lorentz-invariant, and derivatives (104)-(105) are legitimate 4- 
vectors, this implies that 4-vector (116) is also legitimate, i.e. obeys the Lorentz transform formulas 
(97), (99). A more convincing evidence of this fact may be obtained from Maxwell equations (6.109) for 
the potentials. In free space, they may be rewritten as 



8(cty 



c s 0 c 



8(cty 



Vol- 



Using definition (116), these equations may be merged to one: 42 



where □ is the d'Alembert operator 47 ' that may be presented as either of two scalar products, 




(9.118) 



(9.119) 



(9.120) 



Maxwell 

equations 

for 

4-potentials 



D'Alembert 
operator 



41 In the Gaussian units, the scalar potential should not be divided by c. 

42 In the Gaussian units, coefficient //o in the right-hand part of Eq. (119) should be replaced, as usual, with AkIc. 

43 Named after J.-B. dAlembert (1717-1783). Note that in older textbooks, notation D 2 may be met for this 
operator. 
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and is of course Lorentz-invariant. Because of that, and the fact that the Lorentz transform changes both 
4-vectors A a and j a in a similar way, Eq. (119) does not depend on the reference frame choice. Thus we 
have arrived at a key point of this chapter: we see that Maxwell equations are indeed invariant with 
respect to the Lorentz transform. As a by-product, the 4-vector form (119) of these equations (for 
potentials) is extremely simple - and beautiful. 

However, as we have seen in Chapter 7, for many applications the Maxwell equations for field 
vectors are more convenient; so let us present them in the 4-form. For that, we may express the 
Cartesian components of the usual (3D) field vector vectors 



E = -Vtp 



dA 

~dt' 



B = V x A, 



via those of the potential 4-vector A". For example, 



E. = 



80 

dx 



8A X 

~dt 



= -c 



B. = 



d 6 8A X 
— — + — - 
dx c 

dA„ 



^-c{d°A l -d l A°), 



dy dz 



d{ct) 
-(d 2 A 3 -d 3 A 2 ) 



(9.121) 

(9.122) 
(9.123) 



Completing similar calculations for other field components, we find that the following asymmetric, 
contmvariant field-strength tensor, 



d a A p -8 p a 



P 4« 



(9.124) 



may be expressed via the field components as follows: 44 



Field- 
strength 
tensors 




f 0 


-Ejc 


-Ejc 


-Ejc 


Ejc 


0 


-B z 


By 


E y lc 


B 


0 


~B X 




'By 


B x 


0 



(9.125a) 



so that the covariant form of the tensor is 




(9.125b) 



If this expression looks a bit too bulky, note that as a reward, the pair of inhomogeneous 
Maxwell equations, i.e. the two first equations of the system (6.93), which in free space (D = SoE, B = 
jUoft) may be rewritten as 

E _ _ 8 E 



V — = /J 0 cp, 
c 



VxB 



d(ct) c 



(9.126) 



In Gaussian units, this formula, as well as Eq. (131) for G , does not have factors c in all the denominators. 
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may be now rewritten in a very simple (and manifestly Lorentz-invariant) form, 



OF 



a/3 _ 



(9.127) 



that is comparable with Eq. (119) in beauty and simplicity. Somewhat counter-intuitively, the pair of 
homogeneous Maxwell equations, 



First 
pair of 
Maxwell 
equations 
for tensor F 



VxE + 



dB 

~dt 



o, 



VB = 0, 



look, in the 4-vector notation, a bit more complicated: 45 



S a Fp r +d,F ra +d r F afi =0. 



Note, however, that Eq. (128) may be also represented in a much simpler form, 

d a G aP =0, 

using the so-called dual (and also asymmetric) tensor 



(9.128) 

(9.129) 
(9.130) 



Second 
pair of 
Maxwell 
equations 
for tensor F 



f 0 




By 


B z 




0 


-EJc 


EJc 


-By 


EJc 


0 


-EJc 


-B 

V z 


-EJc 




0 



which may be obtained from F a/3 , given by Eq. (125), by the following replacements: 

c c 



(9.131) 



(9.132) 



Besides the proof of the Lorentz-invariance of the Maxwell equations, the 4-vector formalism 
allows us to achieve our initial goal: find out how do the electric and magnetic field component change 
at the transfer between reference frames. Let us apply to tensor F a ^ the reciprocal Lorentz transform 
given by the second of Eqs. (109). Generally, it gives, for each field component, a sum of 16 terms, but 
since (for our choice of coordinates, shown in Fig. 1) there are many zeros in the Lorentz transform 
matrix, and diagonal components of F yS equal zero as well, the calculations are rather doable. Let us 
calculate, for example, E' x = -cF' 0i . The only nonvanishing terms in the right-hand part are 



E'=- c F m =-c 



dx' dx' 10 

dx l dx° 



+ 



dx'° dx n 
dx° dx 1 



-01 



-cy 



(p>-l)^ = E x . 
c 



(9.133) 



Repeating the calculation for other 5 components of the fields, we get very important relations 

E' X = E X , B' X =B X , 

E\, = y(E y -vB z \ B' y = y{B x +vE : I c 2 \ (9.134) 

E \ = r{E z + vB y ), B \ = y{B z - vE y lc 2 \ 



45 To be fair, note that just as Eq. (127), Eq. (129) this is also a set of four scalar equations - in the latter case with 
indices a, /3, and / taking any three different values of the set {0, 1, 2, 3}. 
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whose more compact ("semi-vector") form is 

B' = B . 

(9.135) 



Lorentz 


E\ 








transform of 


= E \\ 


B '\\ 




field 
components 


E\ 


= HE + vxB) ± , 




= 7(B-vxE/c 2 ) i , 



where indices 1 1 and j_ stand, respectively, for the field components parallel and perpendicular to the 
relative velocity of the two reference frames. In the non-relativistic limit, the Lorentz factor y tends to 1, 
and Eqs. (135) acquire an even simpler form 

E'^E + vxB, B'^B — yVxE. (9.136) 

c 

Thus we see that the electric and magnetic fields actually transform to each other even in the first 
order of the vie ratio. For example, if we fly across the field lines of a uniform, static, purely electric 
field E (e.g., the one in a plane capacitor) we will see not only the electric field re-normalization (in the 
second order of the vie ratio), but also a nonvanishing dc magnetic field B' perpendicular to both vector 
E and vector v, the direction of our motion. This is of course what might be expected from the relativity 
principle: from the point of view of the moving observer (which is as legitimate as that of a stationary 
observer), the surface charges of capacitor plates, that create field E, move back creating dc currents 
(114) which induce the apparent magnetic field. Similarly, motion across a magnetic field creates, from 
the point of view of the moving observer, an electric field. 

This fact is very important philosophically. One can say there is no such thing in Mother Nature 
as an electric field (or a magnetic field) all by itself. Not only can the electric field induce the magnetic 
field (and vice versa) in dynamics, but even in an apparently static configuration, what exactly we 
measure depends on our speed relative to the field sources - hence the very appropriate term for the 
whole field we are studying: the electromagnetism. 

Another simple but very important application of Eqs. (134)-(135) is the calculation of the fields 
created by a charged particle moving in free space by inertia, i.e. along a straight line with constant 
velocity u, at the impact parameter 46 (the closest distance) b from the observer. Selecting frame 0 ' to 
move with the particle in its origin, and frame 0 to reside in the "lab" (in that fields E and B are 
measured), we can take v = u. In this case fields E' and B' may be calculated from, respectively, 
electro- and magnetostatics, because in frame 0 ' the particle does not move: 

E' = — ^— B' = 0. (9.137) 
4/rs 0 r' 3 

Selecting the coordinate axes so that at the measurement point x = 0, y = b, z = 0 (Fig. 11a), we may 

2 2 2 1/2 

write x' = -ut', y' = b, z = 0, so that r' = (u f + b ) , and the field components are as follows: 

E' x = C t— 1 Ut ' w? , E' = — — -. .. , E'=0, B' x =B'=B'=0. (9.138) 

4^ 0 ( M V 2 +£ 2 ) 3/2 " 4™ 0 (u 2 t' 2 +b 2 ) 312 ' > • 

Now using the last of Eq. (19b), with x = 0, for the time transform, and the equations reciprocal to Eqs. 
(134) for the field transform (it is evident that they are similar to the direct transform with v replaced 
with -v = -u), in the lab frame we get 



46 This term is very popular in the of particle scattering - see, e.g., CM Sec. 3.7. 
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E r =E' = 



q uyt 

4^o (u 2 r 2 t 2 +b 2 f 2 



, E y =yE' y = 



q yb 
4 ^o (u 2 y 2 t 2 + 



3/2 ' 



B = 0, B=0, B,=^E'=" " 



yb 



' ' 1 " 2 y c 2 4^ 0 ( w y/-/r) ; 



- = — E 

2 2 .V 

C 



E z =0, (9.139) 



(9.140) 
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Fig. 9.11. Field pulses 
induced by a uniformly 
moving charge. 



These results, 47 plotted in Fig. lib, reveal two major effects. First, the charge passage by the 
observer generates not only an electric field pulse, but also a magnetic field pulse. This is natural, 
because, as was repeatedly discussed in Chapter 5, charge motion is essentially an electric current. 48 
Second, Eqs. (139)-(140) show that the pulse duration scale is 



At 



b__b_ 
yu u 



.2 \ 



1/2 



1 



(9.141) 



i.e. shrinks to zero as the charge velocity u approaches the speed of light. This is of course a direct 
corollary of the relativistic length contraction: in the frame 0 ' moving with the charge, the longitudinal 
spread of its electric field at distance b from the motion line is of the order of Ax' = b. When observed 
from the lab frame 0, this interval, in accordance with Eq. (20), shrinks to Ax = Ax'ly = bly, and so does 
the pulse duration scale At = Axlu = blyu. 



9.6. Relativistic particles in electric and magnetic fields 

Now let us analyze dynamics of charged particles in electric and magnetic fields. Inspired by 
"our" success of forming the 4-vector (75) of energy-momentum, 



a \£ 1 / I dx" 

[c j dz 
where u a is the contravariant form of the 4-velocity (63) of the particle, 



(9.142) 



47 In the next chapter, we will re-derive them in a different way. 

48 It is straightforward to use Eq. (140) and the linear superposition principle to calculate, for example, the 
magnetic field of a string of charges moving along the same line, and separated by equal distances Ax = a (so that 
the average current, as measured in frame 0, is quia), and to show that the time-average of the magnetic field is 
given by Eq. (5.20) of magnetostatics, with b instead of p. 
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Charged 
particle's 
dynamics 



Particle's 
dynamics 
in 4-form 



U = 



dx a 
dr 



u„ = 



dr 



(9.143) 



we may notice that the nonrelativistic equation of motion, resulting from the Lorentz-force formula 
(5.10) for the three spatial components of p a , at charged particle's motion in electromagnetic field, 



^ = ? (E + uxB), 

dt 



is fully consistent with the following 4-vector equality (which is evidently Lorentz-invariant): 



dp a 
dr 



= qF u 



(9.144) 



(9.145) 



For example, the a 
dp 1 



dr 



= qF 



1 component of this equation reads 

c 



u„ = 



+ 0-(-yu x ) + (-B z ){-yu y ) + B y {-yu z ) 



= ^[E+uxB] J , (9.146) 



and similarly for two other spatial components (a= 2 and a = 3). We see that these expressions differ 
from the Newton law (144) by the extra factor y. However, plugging into Eq. (146) the definition of the 
proper time interval, dr= dtly, and canceling y'm both parts, we recover Eq. (144) exactly - for any 
velocity of the particle! The only caveat is that if u is comparable with c, p in Eq. (144) has to be 
understood as the relativistic momentum (70) proportional to the velocity-dependent mass M = yin> m 
rather than to the rest mass m. 

The only remaining task is to examine the meaning of the 0 th component of Eq. (145). Let us 
spell it out: 



dl_ 
dr 



= qF 0/i u„ = q 



0-yc + 



E 



(-ru x )+ 



C 



B*0 + 



(-7* 7 ) 



qy- 



E 



(9.147) 



Particle's 
energy 
evolution 



Recalling that p = S-lc, and using dr = dtly again, we see that Eq. (147) looks exactly as the non- 
relativistic relation for the kinetic energy change, 49 

(9.148) 




besides that in the relativistic case the energy has to be taken in the general form (73). 

No question, the 4-component equation (145) of relativistic dynamics is beautiful in its 
simplicity. However, for the solution of particular problems, Eqs. (144) and (148) are frequently 
preferable. As an illustration of this point, let us now use these equations to explore the relativistic 
effects at charged particle motion in uniform, time-independent electric and magnetic fields. In doing 
that, we will, for the time being, neglect the contributions into the field by the particle itself. 50 



49 See, e.g., CM Eq. (1.20) with dp/dt = F = gE. (As a reminder, the magnetic field cannot affect particle's energy, 
because the magnetic component of the Lorentz force is perpendicular to its velocity.) 

50 As was emphasized earlier in this course, in statics this contribution has to be ignored. In dynamics, this is 
generally not true; these self-action effects will be discussed in Sec. 10.6. 
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(i) Uniform magnetic field. Let the magnetic field be constant and uniform in the "lab" reference 
frame 0. Then in this frame, Eqs. (144) and (148) yield 



dp d& 
= guxB, — = 0. 



(9.149) 



dt dt 

From the second equation, 3 = const, we get u = const, /3= ulc = const, y= (1 - /3 2 )" 172 = const, and M \ 
ym = const, so that the first of Eqs. (149) may be rewritten as 



du 

— = UXO) 

dt 



(9.150) 



where (o c is the vector directed along the magnetic field B, with the magnitude equal to the cyclotron 
frequency (sometimes called "gyrofrequency") 




(q icu Cyclotron 
\y.ui) frequency 



If particle's initial velocity un is perpendicular to the magnetic field, Eq. (150) evidently 
describes its circular motion, with the constant speed u = uo, in a plane perpendicular to B, and 
frequency (151). In the nonrelativistic limit u « c, when M — > m, the cyclotron frequency is 
independent on u, but as the kinetic energy is increased to comparable to the rest energy of the particle, 
the frequency decreases, and in the ultrarelativistic limit, 



B 

(o c « qc — , at u ~ c . 
P 



(9.152) 



In the nonrelativistic limit, the cyclotron motion radius, which may be calculated as R = ulco c , is 
proportional to particle's speed, i.e. to the square root of its kinetic energy. However, in the general case 
the radius is proportional to particle's relativistic momentum rather than its speed: 




/q i o\ Cyclotron 
yy.uj) rac jj us 



so that in the ultrarelativistic limit, when p « £/c, R is proportional to the kinetic energy. 



This dependence of co c and R on energy are the major factors in design of circular accelerators of 
charged particles. In the simplest of these machines (the cyclotron, invented in 1929 by E. Lawrence), 
frequency co of the accelerating ac electric field is constant, so that even it is tuned to co c of the initially 
injected particles, the drop of the cyclotron frequency with energy eventually violates this tuning. Due to 
this reason, the maximum particle speed is limited to just ~0.1 c (for protons, corresponding to the 
kinetic energy of just ~15 MeV). This problem may be addressed in several ways. In particular, in 
synchrotrons (such as Fermilab's Tevatron and CERN's LHC) the magnetic field is gradually increased 
in time to compensate the momentum increase (B oc p), so that both R (148) and co c (147) stay constant, 
enabling proton acceleration to energies as high as ~ 7 TeV, i.e. -2,000 mc . 51 



51 For more on this topic, I have to refer the interested reader to special literature, for example either S. Lee, 
Accelerator Physics, 2 nd ed., World Scientific, 2004, or E. Wilson, An Introduction to Particle Accelerators, 
Oxford U. Press, 2001. 
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Returning to our initial problem, if particle's initial velocity has a component u\\ along the 
magnetic field, it is conserved in time, so that the trajectory is a spiral around the magnetic field lines. 
As Eqs. (149) show, in this case Eq. (150) remains valid, but in Eqs. (151) and (153) the full speed and 
momentum have to be replaced with magnitudes of their (also time-conserved) components, u± and p±, 
normal to B, while the Lorentz factor yin those formulas still requires the full speed of the particle. 

Finally, in the special case when particle's initial velocity is directed exactly along the magnetic 
field's direction, it continues to move by straight line along vector B. In this case, the cyclotron 
frequency (151) remains finite, but does not correspond to any real motion, because R = 0. 

(ii) Uniform electric field. This problem is (technically) more complex than the previous one, 
because in the electric field, particle's kinetic energy may change. Directing axis z along the field, from 
Eq. (144) we get 



± = qE, ^ = 0. 
dt dt 

If the field does not change in time, the first integration of these equations is trivial, 

P z (0 = P z (0) + qEt, p ± (0 = const = p ± (0) , 



(9.154) 



(9.155) 



but the further integration requires care, because the effective mass M = ym of the particle depends on its 
full speed: 



2 2 2 

u =u z +u ± , 



(9.156) 



making the two motions, along and across the field, mutually dependent. 



If the initial velocity is perpendicular to field E, i.e. if p z (0) = 0, p±(0) = p(0) = po, the easiest 
way to proceed is to calculate the kinetic energy first: 



3 2 =(mc 2 ) 2 + c 2 p 2 (t) = # 2 +c\qEt)\ where <? 0 = [(mc 2 ) 2 + c 2 p\ ] 
On the other hand, we can calculate the same energy by integrating Eq. (148), 



— = qE u = qE — 

dt dt 



(9.157) 



(9.158) 



over time, with a simple result: 

£ = £ 0 +qEz(t), (9.159) 

where (for the notation simplicity) I took z(0) = 0. Requiring Eq. (159) to give the same 3 2 as Eq. 
(157), we get a quadratic equation for z(t), 

&\ + c 2 (qEt) 2 = [t 0 + qEz(t)]\ (9.160) 

whose solution (with the sign before the square root corresponding to E > 0, i.e. z > 0) is 



qE 









1/2 


< 1 + 


' cqEt^ 


2 


-1. 




V ) 















(9.161) 
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Now let us find particle's trajectory. Selecting axis x so that the initial velocity vector (and hence 
the velocity vector at any further instant) is within the [x, z] plane, i.e. y(t) = 0, we may use Eqs. (155) to 
calculate trajectory's slope, at its arbitrary point, as 



dz 
dx 



dzldt Mu v 



Pz 

p* 



qEt 



in 



dxldt Mu x p x p 0 
Now let us use Eq. (160) to express the nominator of this fraction, qEt, as a function of z: 

qEt= ] -[{t 0 +qEz) 2 -t 2 0 ] / 

Plugging this expression into Eq. (161), we get 

dx cp Q 



(9.162) 



(9.163) 



(9.164) 



This differential equation may be readily integrated, separating variables z and x, and using substitution 
£, = arccosh(g£z/<£o +1)- Selecting the origin of axis x at the initial point, so that x(0) = 0, we finally get 
the trajectory: 



z =■ 



qE 



cosh 



qEx 
cPo 



1 



(9.165) 



At the initial part of the trajectory, where qEx « cpo(0), this expression may be approximated 
by the first nonvanishing term of the Taylor series, giving a parabola: 



z = 



e,qE 



x 



(9.166) 



so that if the initial velocity of the particle is much less than c (i.e. po ~ mw 0 , <^o ~ mc ), we get the 
familiar nonrelativistic formula: 



qE 



2 a 2 
x = —t, 



2m«o 2 



a = — = 
m 



qE 
m 



(9.167) 



This solution may be readily generalized to the case of an arbitrary direction of particle's initial 
velocity; this generalization is left for reader's exercise. 

(iii) Crossed uniform magnetic and electric fields (E _L B). In the view of how bulky the solution 
of the previous problem (i.e. the particular case of the current problem for B = 0) was, one might think 
that this problem should be forbiddingly complex for an analytical solution. Counter-intuitively, it is not 
the case, due to the help from the field transform relations (135). Let us consider two possible cases. 

Case I: Elc < B. Let us consider an inertial frame moving (relatively the "lab" reference frame 0 
in which fields E and B are defined) with velocity 



ExB 



v = ■ 



(9.168) 



whose magnitude v = cx(Elc)IB < c. Selecting the coordinate axes as shown in Fig. 1 1, so that 
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£,=0, E V =E, E : =0; B x =0, B v =0, B z 
we see that the Cartesian components of this velocity are v x = v, v y = v z = 0. 



(9.169) 
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Fig. 9.11. Particle's trajectory in 
x ' crossed electric and magnetic fields 
(at E/c < B). 



Since this choice of coordinates complies with that used to derive Eqs. (134), we can readily use 
that simple form of the Lorentz transform to calculate field components in the moving reference frame: 



E' x =0, E' y =r(E-vB)= r 



E B 

B 



= 0, E\ = 0, 



B' =0, B' = 0, B'_ 



r 



' vE^ 



yB 



vE 
Be 1 



yB 



V C J 



(9.170) 



(9.171) 



where the Lorentz parameter y = (\ - v 2 /c 2 )' 112 corresponds to velocity (168) rather than that of the 
particle. 

Thus in this special reference frame the particle only sees a (re-normalized) uniform magnetic 
field B' < B, parallel to the initial field, i.e. perpendicular to velocity (168). Using the result of the above 
example (i), we see that in this frame the particle will move along either a circle or a spiral winding 
about the direction of the magnetic field, with angular speed (151), 



co, = 



qB ' 



2 ' 



and radius (148): 



R' 



qB' 



(9.172) 



(9.173) 



Hence in the lab frame, the particle will perform such orbital motion plus a "drift" with constant velocity 
v (Fig. 11). As the result, the lab-frame trajectory of the particle (or rater its projection onto the plane 
perpendicular to the magnetic field) is a trochoid-like curve 52 that, depending on the initial velocity, 
may be either prolate (self-crossing), as in Fig. 11, or curtate (stretched so much that it is not self- 
crossing). 



52 As a reminder, a trochoid may be described as the trajectory of a point on a rigid disk rolled along a straight 
line. Its canonical parametric presentation is x = 0 + acos 0, y = asm 0. (For a > 1, the trochoid is prolate, if a < 
1, it is curtate, and if a = 1, it is called the cycloid.) Note, however, that for our problem, the trajectory in the lab 
frame is exactly trochoidal only in the nonrelativistic limit v « c (i.e. E/c « B), because otherwise the Lorentz 
contraction in the drift direction squeezes the cyclotron orbit from a circle into an ellipse. 
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Such looped motion of electrons (in practice, with v « c) is used, in particular, in magnetrons - 
generators of microwave radiation. In these devices (Fig. 12a), the magnetic field, usually created by 
specially-shaped permanent magnets, is nearly uniform (in the region of electron motion) and directed 
along magnetron's axis, while the electric field of magnitude E « cB, created by the dc voltage applied 
between the anode and cathode, is virtually radial. As a result, the above simple theory is only 
approximately valid, and electron trajectories are close to epicycloids rather than trochoids. The applied 
electric field is adjusted so that these trajectories pass close to the gap openings to cylindrical 
microwave cavities drilled in magnetron's bulk anode (Fig. 12b). The fundamental mode of each cavity 
is quasistationary, with cylindrical walls working mostly as lumped inductances, and gaps as lumped 
capacitances, with the microwave electric field concentrated in the gap openings. This is why the mode 
is strongly coupled to the passing electrons, and their interaction creates large positive feedback 
(equivalent to negative damping) that results in intensive microwave self-oscillations at cavities' 
eigenfrequency. 53 The oscillation energy, of course, is taken from the dc-field-accelerated electrons; due 
to the energy loss each electron gradually moves closer to the anode and finally lands on its surface. The 
wide use of such generators (in particular, in microwave ovens, which operate in a narrow frequency 
band around 2.45 GHz, allocated for these devices to avoid their interference with wireless 
communication systems) is due to their simplicity and high (up to 65%) efficiency. 




Case II: Elc > B. In this case, the speed given by Eq. (168) would be above the speed of light, so 
let us introduce a reference frame moving with a different velocity, 



v = 



ExB 



(9.174) 



whose direction is the same as before (Fig. 11), and magnitude v = cxB/(E/c) is again below c. A 
calculation absolutely similar to the one performed above for Case I, yields 



E\=0, E\,=y{E-vB)=yE 



1 



vB 



= yE 



2 \ 



1 



= -<E, E\=0, (9.175) 

7 



53 See, e.g., CM Sec. 4.4. 
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B' x =0, B' = 0, B'=y 



B 



vE_ 



Y 



B 



EB 



= 0. 



(9.176) 



so that in the moving frame the particle sees only an electric field E' < E. According to the solution of 
our previous problem (ii), the trajectory of the particle in the moving frame is hyperbolic, so that in the 
lab frame it has an "open", hyperbolic character as well. 

To conclude this section, let me note that if the electric and magnetic fields are non-uniform, the 
particle motion is much more complex, and in most cases the integration of equations (144), (148) may 
be carried out only numerically. However, if the field nonuniformity is small, (approximate) analytical 
methods may be very effective. For example, if the magnetic field has a small longitudinal gradient V5 
in a direction perpendicular to vector B itself, such that 



\VB\ 1 

7] =- -« 



B 



R 



(9.177) 



where R is the cyclotron radius (153), then it is straightforward to use Eq. (150) to show 54 that the 
cyclotron orbit drifts perpendicular to both B and V5, with speed 



1 



2 , 2 

u L + u« 



« u 



(9.178) 



The physics of this drift is rather simple: according to Eq. (153), the instant curvature of the 
cyclotron orbit is proportional to the local value of the field. Hence if the field is nonuniform, the 
trajectory bends more on its parts passing through stronger field, thus acquiring a shape close to a curate 
trochoid. 

For engineering and experimental practice, effects of longitudinal gradients of magnetic field on 
charged particle motion are much more important, but let me postpone their discussion until we have got 
a little bit more analytical tools in the next section. 



9.7. Analytical mechanics of charged particles 

Equation (145) gives a full description of relativistic particle dynamics in electric and magnetic 
fields, just as the 2 nd Newton law (1) does it in the nonrelativistic limit. However, we know that in the 
latter case, the Lagrange formalism of analytical mechanics allows an easier solution of many 
problems. 55 We can fully expect that to be true in relativistic mechanics as well, so let us expand the 
analysis of Sec. 3 to particles in the field. 

Let recall that for a free particle, our main result was Eq. (68), which may be rewritten as 

y£ = -mc 2 , (9.179) 

showing that this product is Lorentz-invariant. How can the electromagnetic field affect this relation? In 
electrostatics, we could write 

£ = T-U =T-q<fi. (9.180) 



54 See, e.g., Sec. 12.4 in J. D. Jackson, Classical Electrodynamics, 3 r ed., Wiley, 1999. 

55 See, e.g., CM Sec. 2.2 and beyond. 
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However, in relativity the scalar potential tj> is just one component of the potential 4-vector (116). The 
only way to get a Lorentz-invariant contribution to y£ from the full 4-vector, that would be also 

proportional to the Lorentz force, i.e. to the first power of particle's velocity (to account for the 
magnetic component of the Lorentz force), is evidently 

-mc 2 + const xu a A a , 



(9.181) 



where u a is the 4-velocity (63). In order to comply with Eq. (180) in electrostatics, the constant factor 
should be equal to (-qc), so that Eq. (182) becomes 



yJL = -mc 2 - qu a A a , 



and, finally, 



i.e., in the Cartesian form, 



mc 



q<j) + qu ■ A . 



«/ = -mc" 



2 2 2 \ 

u , + u„ +u. 



1/2 



q<f> + q{u x A x + u A +u z A z ). 



(9.182) 



(Q 1 QT.\ Lagrangian 
yy.iojj function 



(9.184) 



Let us see whether this relation (that admittedly was obtained above by an educated guess rather 
than by a strict derivation) passes a natural sanity check. For the case of unconstrained motion of a 
particle, we can select its three Cartesian coordinates r ; (j = 1, 2, 3) as the generalized coordinates, and 
linear velocity components Uj as the corresponding generalized velocities. In this case, the Lagrange 
equations of motion are 56 



d 3/ 
dt du , 



•^ = 0. 
Sr. 



(9.185) 



For example, for r\ = x, Eq. (184) yields 
5/ mu 
^"(l- M 2 /c 2 ) 
so that Eq. (185) takes the form 



1/2 



dx 



dA. 



dd> dA 
-q 1- gu 

dx dx 



(9.186) 



dp , dd) dA 

— - = -q — + qu q — - 

dt dx dx dt 



(9.187) 



In equations of motion, field values have to be taken at the instant position of the particle, so that 
the last (full) derivative has components due to both the actual field change (at a fixed point of space) 
and the particle's motion. Such addition is described by the so-called convective derivative 51 



d _ 


= ^ + u-V. 






dt 


dt 



,q i no\ Convective 
(y.loo; derivative 



56 See, e.g., CM Sec. 2.1. 

57 Alternatively called the "Lagrangian derivative"; for its (rather simple) derivation see, e.g., CM Sec. 8.3. 
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Spelling out both scalar products, we may group the terms remaining after cancellations as follows: 



dp x 
dt 



dx 



dA x 
~dt 



dx dy 



dA r dA, 



dz dx 



(9.189) 



But taking into account relations (121) between the electric and magnetic fields and potentials, this 
expression is nothing more than 



dt 



= q(E x + u y B z - u,B y )= ? (E + ux B) v , 



(9.190) 



i.e. the ^-component of Eq. (144). Since other Cartesian coordinates participate in Eq. (184) in a similar 
way, it is evident that the Lagrangian equations of motion along other coordinates yield other 
components of the same vector equation of motion. 

So, Eq. (183) does indeed give the correct Lagrangian function, and we can use it for the further 
analysis, in particular to discuss the first of Eqs. (186). This relation shows that in the electromagnetic 
field, the generalized momentum corresponding to particle's coordinate x is not p x = m^but 58 



du. 



(9.191) 



Thus, as was already mentioned in brief in Sec. 6.3, particle's motion in a field may be is described by 
two momentum vectors: the kinetic momentum p, defined by Eq. (70), and the canonical (or 
"conjugate") momentum 59 



Particle's 
canonical 
momentum 



P = p + q A 



(9.192) 



In order to facilitate the discussion of this notion, let us generalize expression (72) for the 
Hamiltonian function "H of a free particle to the case of a particle in the field: 



# = P-u-/ = (p + gA)-u 



mc 



y 



+ qu ■ A-q<j> 



p u + 



mc 



7 



+ q</>. 



(9.193) 



Merging the first two terms exactly as it was done in Eq. (72), we get an extremely simple result, 

?f = ymc 2 +q0, (9.194) 

that may leave us wondering: where is the vector-potential A here - and the field effects is has to 
describe? The resolution of this puzzle is easy: for a practical use (e.g., for the alternative derivation of 
the equations of motion), W has to be presented as a function of particle's generalized coordinates (in 
the case of unconstrained motion, these may be the Cartesian components of vector r that serves as an 
argument for potentials A and <fi), and the generalized momenta, i.e. the Cartesian components of vector 
P (plus, generally, time). Hence, velocity u and factor y should be eliminated from Eq. (194). This may 
be done using relation (192), ymu = P - qA. For such elimination, it is sufficient to notice that according 



58 With regrets, I have to use the same (common) notation as was used earlier for the electric polarization - which 
is not discussed below. 

59 In Gaussian units, Eq. (192) has the form P = p + qA/c. 
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to Eq. (193), difference (ff- q<f>) is equal to the right-hand part of Eq. (72), so that the generalization of 
Eq. (78) is 60 



(#-q<f>Y =(mc 2 ) 2 + c z (P-qA) 



(Q i qs\ Particle ' s 

Hamiltonian 



It is straightforward to verify that the Hamilton equation of motion 61 using this ft result in the same 
equation of motion (144). In the nonrelativistic limit, the Taylor expansion of Eq. (195) to the first term 
in p 2 yields the following generalization of Eq. (74): 

■H-mc 1 ~ — + U =— (P-<?A) 2 +U, U=q0. (9.196) 
2m 2m 

This expression for #and Eq. (183) for / give a clear view of the electromagnetic field effect 

account in analytical mechanics. The electric part of the total Lorentz force g(E + uxB) can perform 
work on the particle, i.e. change its kinetic energy - see Eq. (148) and its discussion. As a result, the 
scalar potential <ft, whose gradient gives a contribution into E, may be directly associated with potential 
energy U = qifi. On the contrary, the magnetic component guxB of the Lorentz force is always 
perpendicular to particle's velocity u, and cannot work on it, and as a result cannot be described by a 
contribution to U. However, if A did not participate in functions / and/or #at all, analytical mechanics 

would be unable to describe effects of magnetic field B = VxA on particle's motion. Relations (183) and 
(197) show the wonderful way in which physics (or Mother Nature herself?) solves this problem: the 
vector-potential gives such contributions to both / and ft (if the latter is considered, as it should be, a 
function of P rather than p) that cannot be uniquely attributed to either kinetic or potential energy, but 
ensure the correct equation of motion (144) in both the Lagrange and Hamilton formalisms. 

I believe I still owe the reader a discussion of the physical sense of the canonical momentum P. 
For that, let us consider a particle moving near a region of localized magnetic field B(iy), but not 
entering this region. If there is no electrostatic field (no other electric charges nearby), we can select 
such a local gauge that <fi(r, t) = 0 and A = A(f), so that Eq. (144) is reduced to 

dp ., dA 

-JL = qE = - q (9.197) 
dt dt 



immediately giving 



dP 

— = 0. (9.198) 
dt 



Hence, even if the magnetic field is changed in time, so that the induced electric field accelerates the 
particle, its conjugate momentum does not change. Hence P is a variable more stable to magnetic field 
changes than its kinetic counterpart p. This conclusion may be criticized because it relies on a specific 
gauge, and generally Psp + qA is not gauge-invariant, because vector-potential A isn't. 62 However, as 
was already discussed in Sec. 5.3, integral \A-dr over a closed contour does not depend on the chosen 



60 This relation may be also obtained from the expression for the Lorentz-invariant norm, p a p a = {nicf, of the 4- 
momentum (75), p a = {&c, p} = {(/¥- qfylc, V-qA). 

61 See, e.g., CM Sec. 10.1. 

62 The kinetic momentum p = Mu is just the usual mu product modified for relativistic effects, so that this variable 
is evidently gauge- (though not Lorentz-) invariant. 
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gauge and equals to the magnetic flux O through the area limited by the contour - see Eq. (5.65). 
Integrating Eq. (197) over a closed trajectory of a particle (Fig. 13), and over the time of one orbit, we 
get 



A|p • dr = -<?AO, so that a|p • dr = 0 . 



(9.199) 



where AO is the change of flux during that time. This gauge-invariant result confirms the above 
conclusion about the stability of the canonical momentum to magnetic field variations. 



A A A B ( r >0 




Fig. 9.13. Particle's motion around a localized 
magnetic flux. 



Generally, Eq. (199) is invalid if a particle moves inside a magnetic field and/or changes its 
trajectory at the field variation. However, if the field is almost uniform, i.e. its gradient small in the 
sense of Eq. (177), this result is (approximately) applicable. Indeed, analytical mechanics 63 tells us that 
for any canonical coordinate-momentum pair {qj, pj} , the corresponding action variable, 



(9.200) 



is asymptotically constant at slow variations of motion conditions. According to Eq. (191), for a particle 
in magnetic field, the generalized momentum corresponding to Cartesian coordinate ry is Pj rather than 
Pj. Thus forming the net action variable J = J x + J y + J z , we may write 



2ttJ = j>P • dr = j>p • dr + = const . 



(9.201) 



Let us apply this relation to the motion of a nonrelativistic particle in an almost uniform 
magnetic field, with a small longitudinal velocity, -u\\l u±_ — » 0 (Fig. 14). 




B 



Fig. 9.14. Particle in a magnetic field with 
a small longitudinal gradient VB \ I B. 



In this case, O in Eq. (201) is the flux encircled by a cyclotron orbit, equal to {-nR B), where R is 
its radius given by Eq. (153), and the negative sign accounts for the fact that the "correct" direction of 



63 See, e.g., CM Sec. 10.2. 
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the normal vector n in the definition of flux, O = \B-nd 2 r, is antiparallel to vector B. At u « c, the 
kinetic momentum is just p_i = mu±, while Eq. (153) yields 

mu L = qBR . (9.202) 

Plugging these relations into Eq. (201), we get 

27zJ = mu^lnR -q7di 2 B = m^—-2nR-q7iR 2 B = (2 - Y)q7tR 2 B = -q<& . (9.203) 

m 

This means that even if the circular orbit slowly moves in the magnetic field, the flux encircled 
by the cyclotron orbit should remain constant. One manifestation of this effect is the result already 
mentioned in the end of Sec. 6: if a small gradient of the magnetic field is perpendicular to the field 
itself, particle orbit's drift is perpendicular to V5, so that O stays constant. Now let us analyze the case 
of a small longitudinal gradient, V5 1 1 B (Fig. 14). If the small initial longitudinal velocity u\\ is directed 
toward the higher field region, in order to keep O constant, the cyclotron orbit has to gradually shrink. 
Rewriting Eq. (202) as 

nR 2 B |Ol 

mu ± = q — — = q L ~i> (9.204) 

7iR nR 

we see that this reduction of R (at constant O) should increase the orbiting speed u±. But since the 
magnetic field cannot do work on the particle, its kinetic energy, 



* = ^(», 2 +ul), (9.205) 



1 

should stay constant, so that the longitudinal velocity u\\ has to decrease. Hence eventually orbit's drift 
has to stop, and then the orbit has to start moving back toward the region of lower fields, being 
essentially repulsed from the high-field region. This effect is very important, in particular, for plasma 
confinement: two coaxial magnetic coils, inducing magnetic fields of the same direction (Fig. 15), 
naturally form a "magnetic bottle" that traps charged particles injected, with sufficiently low 
longitudinal velocities, into the region between the coils. Such bottles are the core components of the 
(generally, very complex) systems used for plasma confinement, in particular in the context of the long- 
term efforts to achieve controllable nuclear fusion. 64 




Fig. 9.15. Magnetic bottle (VERY schematically). 



Returning to the constancy of magnetic flux encircled by free particles, it reminds us of the 
Meissner-Ochsenfeld effect discussed in Sec. 6.3, and gives a motivation for a brief revisit of the 
electrodynamics of superconductivity. As was emphasized in that section, superconductivity is a 



64 For the further reading on this technology, the reader may be referred, for example, to a simple monograph by 
F. C. Chen, Introduction to Plasma Physics and Controllable Fusion, vol. 1, 2 nd ed., Springer, 1984, and/or a 
graduate-level theoretical treatment by R. D. Hazeltine and J. D. Meiss, Plasma Confinement, Dover, 2003. 
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substantially quantum phenomenon; nevertheless the notion of the conjugate momentum P helps to 
understand its description. Indeed, the general rule of quantization of physical systems 65 is that each 
canonical pair {qj, pj] of a generalized coordinate and the corresponding momentum is described by 
quantum-mechanical operators that obey the following commutation relation 



i»V (9.206) 



According to Eq. (191), for Cartesian coordinates r ; of a particle in electromagnetic field, the 

corresponding generalized momenta are Pj, so that their operators should obey the following 
commutation relations: 

}j'Pj]=iMjf (9-207) 

In the coordinate representation of quantum mechanics, canonical momentum operators are 
described by Cartesian components of the vector operator -ihV . As a result, ignoring the rest energy mc 
(which gives an inconsequential phase factor Qxp{-imc 2 t/h} in the wave function), we can use Eq. (196) 
to rewrite the nonrelativistic Schrodinger equation, 

in^ = Wy/, (9.208) 



as follows: 

Pi ( " 2 \ r 1 
m qy_ = P_ + u w= (-ihV-qA) 2 +q<j> y/. (9.209) 



dt 



( ~2 A 






¥ = 


y 2m j 



— — (- zW - qA) 2 +q</) 
2m 



Thus, I believe I have finally delivered on my promise to justify the replacement (6.44) which 
had been used in Chapter 6 to discuss electrodynamics of superconductors, including the Meissner- 
Ochsenfeld effect. 66 



9.8. Analytical mechanics of electromagnetic field 

We have just seen that analytical mechanics of a particle in an electromagnetic field may be used 
to get some important results. The same is true for the analytical mechanics of the field alone, and the 
field-particle system as a whole, which will be discussed in this section. For such a space-distributed 
system as the field, governed by local dynamics laws (Maxwell equations), we need to apply analytical 
mechanics to the local densities £ and A of the Lagrangian and Hamiltonian functions, defined by 
relations 

/ = J&V, W = \Ad l r. (9.210) 

Let us start, as usual, from the Lagrange formalism. Some clue on the possible structure of the 
Lagrangian density £ may be obtained from that of the description of the particle-field interaction in this 



65 See, e.g., CM Sec. 10.1. 

66 Equation (209) is also the basis for discussion of numerous other magnetic field phenomena, including the 
Aharonov-Bohm and quantum Hall effects - see, e.g., QM Sees. 3.1-3.2. 
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formalism, which was discussed in the last section. For the case of a single particle, the interaction is 
described by the last two terms of Eq. (183): 



Ant =-q0-qu-A. 



(9.211) 



It is obvious that if charge q is continuously distributed over some volume, we may present / as a 
volume integral of Lagrangian density 



-p0 + yA = -j a A a . 



Interaction 
(9.212) Lagrangian 
density 



Notice that the density (in contrast to itself) is Lorentz-invariant. (This is due to the 
contraction of the longitudinal coordinate, and hence volume, at the Lorentz transform.) Hence we may 
expect the density of field's Lagrangian to be Lorentz-invariant as well. Moreover, in the view of the 
simple, local structure of the Maxwell equations (containing only first spatial and temporal derivatives 
of the fields), £ should be a simple function of potential's 4-vector and its 4-derivative: 



£ = £{A a ,d a A p ). 



(9.213) 



Also, the density should be selected in such a way that the 4-vector analog of the Lagrangian equations 
of motion, 



8a ^A^j dA p °' 



(9.214) 



gave us correct inhomogeneous Maxwell equations (127). 67 ' 68 It is clear that the field part 4eid of the 
total Lagrangian density £ should be a scalar, and a quadratic form of the field strength, i.e. of F"^, so 
that the natural choice is 



t = const x F F a/} 

'field ^UllSl A r a p r 



(9.215) 



with implied summation over both indices. Indeed, adding to this expression the interaction Lagrangian 
(212), 



field int 



const *F ap F aP 



(9.216) 



and performing differentiation, we may check that Eq. (214) indeed yields Eqs. (127), provided that the 
constant factor equals (-l/4// 0 )- 69 With that, the field Lagrangian 




Field's 
(9.217) Lagrangian 



density 



where u e is the local density of the electric field energy density (1.67), and u m is the magnetic field 
energy density (5.57). 



67 As a reminder, the homogeneous Maxwell equations (129) are satisfied by the very structure (125) of the field 
strength tensor. 

68 Here the implicit summation over index a plays the role similar to the convective derivative (188) in replacing 
the full derivative over time, in a way that reflects the symmetry of time and space in special relativity. I do not 
want to spend more time to justify Eq. (214) because of the reasons that will be clear very soon. 

69 In the Gaussian units, the coefficient is (-\l\6n). 
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Let me hope the reader agrees that Eq. (217) is a wonderful result, because the Lagrangian 
function has the structure absolutely similar to the well-known expression X = T - U of the classical 

mechanics. So, for the field alone, the "potential" and "kinetic" energies are separable again. 70 

As a sanity check, let us explore whether we can calculate a 4-vector analog of the Hamiltonian 
function W. In the generic analytical mechanics, 



(9.218) 



However, just as for the Lagrangian function, for a field we should find the spatial density U of the 
Hamiltonian, defined by the second of Eqs. (210), for which a natural 4-form of Eq. (218) is 



ap 



d p A 7 -g 



<*P, 



d(8 a A") 

Calculated for the field alone, i.e. using Eq. (217) for 6, this definition yields 



i ap 
"field 



gap _ T aP 



D ' 



where tensor 



Symmetric 
energy- 
momentum 
tensor 




is gauge-invariant, while the remaining term, 



1 

Mo 



(9.219) 



(9.220) 



(9.221) 



(9.222) 



is not, so that it cannot correspond to any measurable variables. Fortunately, it is straightforward to 
verify that tensor td may be presented in the form 



.ap 

D 

Mo 



and as a result obeys the following relations: 

9*<=0, \r° I fd 3 r = 0, 



(9.223) 



(9.224) 



so it does not interfere with the conservation properties of the gauge-invariant, symmetric energy- 
momentum tensor (also called the symmetric stress tensor) 0*^, to be discussed below. 

Using Eqs. (125), components of the latter tensor may be expressed via the electric and magnetic 
fields. For a = f}= 0, 



v -Afield -^r h +^ — 
2 2// 0 



= u e +u m =u, 



(9.225) 



70 Since the Lagrange equations of motion are homogeneous, the simultaneous change of sign of T and U does not 
change them. Thus, it is not important which of two energy densities, u e or u m , we count as the potential energy. 
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i.e. the expression for the total energy density u. The remaining 3 components of the same row/column 
turn out to be just the Cartesian components of the Poynting vector, divided by c: 



e 



jo _ 



1 










— xB 




-xH 


Mo 




j 


kc J 



= for; = l,2,3. 
c 



The remaining 9 components Ojy of the tensor, with j' = 1, 2, 3, are usually presented as 
where is the so-called Maxwell stress tensor. 



1 jf > 




(9.226) 



(9.227) 



Maxwell 
(9.228) stress 
tensor 



so that the whole symmetric energy-momentum tensor may be conveniently presented in the following 
symbolic way: 



6 ap = 



u 

t 
S 

c 
i 



<- S/c -> 



-(M) 



(9.229) 



The physical meaning of this tensor may be revealed in the following way. Considering Eq. 
(221) just as the definition of tensor ff^, 71 and using the 4-vector form of Maxwell equations, given by 
Eqs. (127) and (129), it is straightforward to verify an extremely simple result for the 4-derivative of the 
symmetric tensor: 



d„e 



a/i 



~F Py j . 



Symmetric 
(9.230) tensor's 

4-derivative 



This expression is valid in the presence of the electromagnetic field sources, e.g., for any system of 
charged particles and the field they have created. Of these 4 equations (for 4 values of index ft), the 
temporal one (with /? = 0) may be simply expressed via the energy density (225) and Poynting vector 
(226): 



du 

~dt 



+ V • S = - j • E , 



while 3 spatial equations (with /3=j= 1, 2, 3) may be presented in the form 

dt c 1 



jUdr r 



(9.231) 



(9.232) 



Integrated over a volume V limited by surface S, with the account of the divergence theorem, Eq. 
(231) returns us to the Poynting theorem (6.103): 



71 In this way, we are using Eqs. (214) and (221) just as a useful guesses, leading to the definition of ff^, and may 
leave their strict justification for more serious field theory courses. 
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dt 



while Eq. (232) yields: 72 



d s 



dt c : 



+ f 



d i r = Y J §r i pdA r , with f = /?E + jxB, 

/=1 s 

mponent of the elementary area vect< 
normal to volume's surface, and directed out of the volume - see Fig. 16. 



(9.233) 



(9.234) 



where dAj = njdA = njd 2 r is the j th component of the elementary area vector dA = ndA = nd 2 r that is 




Fig. 9.16. Force dF exerted on a boundary 
element dA of volume V occupied by the field. 



Total 
momentum's 
dynamics 



Since, according to Eq. (5.10), vector f is nothing else than the density of volume-distributed 



forces applied from the field to the particles, we can use the 2 
(144), to rewrite Eq. (234), for a stationary volume V, as 



nd 




Newton law, in its relativistic form 



(9.235) 



Force via 
the Maxwell 
tensor 



Electro- 
magnetic 
field's 
momentum 



where p pa rt is the total mechanical (relativistic) momentum of all particles in volume V, and vector F is 
defined by its Cartesian components: 



(9.236) 




Equations (235)-(236) are our main new results. The first of them shows that vector 




(9.237) 



may be interpreted as the density of momentum of the electromagnetic field (per unit volume). This 
classical relation is consistent with the quantum-mechanical picture of photons being considered as 
ultrarelativistic particles, with momentum magnitude 31c, because then the total flux of the momentum 
carried by photons through a unit normal area per unit time may be presented as either S n lc or as g n c. It 
also allows us to revisit the Poynting vector paradox that was discussed in Sec. 6.7 - see Fig. 6.9 and its 



72 Just like the Poynting theorem (233), Eq. (234) may be obtained directly from the Maxwell equations, without 
resorting to the 4-vector formalism - see, e.g., Sec. 8.2.2 in D. J. Griffiths, Introduction to Electrodynamics, 3 ld 
ed., Prentice-Hall, 1999. However, the derivation discussed above is preferable, because it shows the wonderful 
unity between the laws of conservation of energy and momentum. 
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discussion. As has been emphasized at this discussion, vector S = ExH in this case does not correspond 
to any measurable energy flow. However, the corresponding momentum (237) of the field is not only 
real, but may be measured by the recoil impulse 73 it gives to the field sources (say, to a magnetic coil 
inducing field H and to the capacitor plates creating field E). 

Now let us turn to our second result, Eq. (236). It tells us that the 3x3-element Maxwell stress 
tensor complies with the general definition of the stress tensor 74 characterizing force F exerted by 
external forces on the boundary of a volume, in this case occupied by the electromagnetic field (Fig. 
16). 75 Let us use this important result to analyze two simple examples for static fields. 

(i) Electrostatic field 's effect on a perfect conductor. Since Eq. (235) has been derived for a free 
space region, we have to select volume V outside the conductor, but we may align one of its faces with 
conductor's surface (Fig. 17). 



z 




Fig. 9.17. Electrostatic field near conductor's surface. 



From Chapter 2, we know that electrostatic field has to be perpendicular to conductor's surface. 
Selecting axis z in this direction, we have E x = E y =0, E z = ±E, so that only diagonal components of 
tensor (228) do not equal zero: 



(M) = (M) _ 

K VV 



~^E 2 , 



'^E\ 



(9.239) 



Since the elementary surface area vector has just one nonvanishing component, dA z , according to Eq. 
(236), only the last component (that is positive regardless of the sign of E) gives a contribution to the 
surface force F. We see that the force exerted by the conductor (and eventually by external forces that 
hold the conductor in its equilibrium position) on the field is normal to the conductor and directed out of 
the field volume: dF z > 0. Hence, by the 3 rd Newton law, the force exerted by the field on conductor's 
surface is directed toward the field-filled space: 



^surface = = 



-E 2 dA. 



Electric 
(9.240) field's 
pull 



This important result could be obtained by simpler means as well. For example, one could argue, 
quite convincingly, that the local relation between the force and field should not depend on the global 



73 This impulse is sometimes called the hidden momentum; this term makes sense if the field sources have finite 
masses, so that their velocity change at the field variation is measurable. 

74 See, e.g., CM Sec. 7.2. 

75 Note that the field-to-particle interaction gives a vanishing contribution into the net integral, as it should for any 
internal interaction between internal parts of a system. 
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configuration creating the field, and consider a planar capacitor (Fig. 2.2) with surfaces of both plates 
charged by equal and opposite charges of density a = ±sqE. According to the Coulomb law, the charges 
should attract each other, pulling each plate toward the field region, so that Maxwell-tensor result gives 
the correct direction of the force. The force's magnitude (240) can be verified either by the direct 
integration of the Coulomb law, or by the following simple reasoning. In the plane capacitor, field E z = 
of so is equally contributed by two surface charges; hence the field created by the negative charge of the 
counterpart plate (not shown in Fig. 17) is E. = <r/2so, and the force it exerts of the elementary surface 

2 2 

charge dQ = odA of the positively charged plate is dF = E.dQ = a dAI2s§ = s 0 E dA/2, in accordance 
with Eq. (240). 76 

Quantitatively, even for such high electric field as E = 3 MV/m (close to the electric breakdown 
in air), the "negative pressure" (dF/dA) given by Eq. (240) is of the order of 500 Pa (N/m 2 ), i.e. below 
one thousandth of the ambient atmospheric pressure (1 bar « 10 5 Pa). Still, these forces may be 
substantial in some cases, especially in good dielectrics (such as high-quality SiC>2, grown at high 
temperature, which is broadly used in integrated circuits) that can withstand fields up to ~10 9 V/m. 



(ii) Static magnetic field's effect on its source 77 
surface (Fig. 18). 



say, solenoid's wall or superconductor's 



x 



z 

-> 




Fig. 9.18. Static magnetic field near a 
current-carrying surface. 



With the choice of coordinates shown in Fig. 18, we have B x = ±B, B y = B, = 0, so that the 
Maxwell stress tensor (228) is diagonal again: 



2 Mo 



l -B> 



2 M 0 



(9.241) 



However, but since for this geometry only dA z differs from 0 in Eq. (236), the sign of the resulting force 
is opposite to that in electrostatics: dF z <0, and the force exerted by the magnetic field upon the 
conductor's surface, 



Magnetic 
field's 
push 



'^surface — dF z — 



1 -> 

B dA, 



2 A) 



(9.242) 



76 By the way, repeating these arguments for a plane capacitor filled with a linear dielectric, we may 
readily see that Eq. (240) may be generalized for this case by replacing sq for s. The similar replacement 
(jJo — > jJ) is valid for Eq. (242) in a linear magnetic medium. 

77 The causal relation is not important here. Especially in the case of a superconductor, the magnetic field may be 
induced by another source, with the surface supercurrent j just shielding the superconductor's bulk from its 
penetration - see Sec. 6. 
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corresponds to a positive pressure. For good laboratory magnets (5-10 T), this pressure is of the order 
of4xl0 7 Pa » 400 bars, i.e. is very substantial, so the magnets require solid mechanical design. 

The direction of force (242) could be also readily predicted elementary magnetostatics 
arguments. Indeed, we can imagine the magnetic field volume limited by another, parallel wall with the 
opposite direction of surface current. According to the starting point of magnetostatics, Eq. (5.1), such 
surface currents of opposite directions have to repulse each other - doing that via the magnetic field. 

Another explanation of the fundamental sign difference between the electric and magnetic field 
pressures may be provided on the electric circuit language. As we know from Chapter 2, the potential 
energy of the electric field stored in a capacitor may be presented in two equivalent forms, 

cv 2 o 2 

Similarly, the magnetic field energy of in an inductive coil is 

LI 2 O 2 

V,=- = Tl - (W44) 

If we do not want to consider the work of external sources on a virtual change of the system dimensions, 
we should use the latter forms of these relations, i.e. consider a galvanically detached capacitor (Q = 
const) and an externally-shorted inductance (O = const). 78 Now if we let the electric field forces (240) 
drag capacitor's plates in the direction they "want", i.e. toward each other, this would lead to a reduction 
of the capacitor thickness, and hence to an increase of capacitance C, and hence to a decrease of U e . 
Similarly, for a solenoid, allowing pressure (242) to move its walls would lead to an increase of the 
solenoid volume, and hence of its inductance L, so that the potential energy U m would be also reduced - 
as it should be. It is remarkable (actually, beautiful) how do the local field formulas (240) and (242) 
"know" about these global circumstances. 

Finally, let us see whether the major results (237) and (242), obtained in this section, match each 
other. For that, let us return to the normal incidence of a plane, monochromatic wave from free space on 
the plane surface of a perfect conductor (see Fig. 7.8 and its discussion), and use those results to 
calculate the time average of pressure dF sur ^ ce /dA imposed by the wave on the surface. At elastic 
reflection from conductor's surface, electromagnetic field's momentum retains its amplitude but 
changes its sign, so that the momentum transferred to a unit area of the surface (i.e. average pressure) is 

* * 



^surface _ <•> — 9 r *^ incident — "> r \ CO^ CO _ ^CO^ CO (QOA^ 

- ^ -^g mclAent -ZC c 2 ~^ c 2 2 " c > 

where E a and H a are complex amplitudes of the incident wave. Using relation (7.7) between these 
amplitudes (for s = so and ju = juo giving E m = cB^, we get 

* Id I 2 

^surface = j_ ^ jggL = M_ , (9.246) 

dA c m ju 0 ju 0 



78 Of course, this condition may hold "forever" only for solenoids with superconducting wiring, but even in 
normal-metal solenoids with practicable inductances, the flux relaxation constants L/R may be rather large 
(practically, up to a few minutes), quite sufficient to carry out force measurements.. 
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On the other hand, as was discussed in Sec. 7.4, at the surface of the perfect mirror the electric 
field vanishes while the magnetic field doubles, so that we can use Eq. (242) with B — > B{t) = 2Re[BaP{- 
icot}]. Averaging the pressure over time, we get 

\2 



^surface 



= — (2Reke- /ft,f l) 2 J-^L , (9.247) 



dA 2/j 0 

i.e. the same result as Eq. (246). 

For the physics intuition development, it is useful to estimate the electromagnetic radiation 
pressure's magnitude. Even for the relatively high wave intensity S„ of 1 kW/m (close to that of the 
direct sunlight at Earth's orbit), pressure 2cg n = 2S n /c is somewhat below 10" 5 Pa ~ 10" 10 bar. Still, this 
extremely small effect was experimentally observed (by P. Lebedev) as early as in 1899, giving one of 
the most important confirmations of Maxwell's theory. 



9.9. Exercise problems 
9.1 . Use the nonrelativistic Doppler effect picture to derive Eq. (3). 



9.2 . Show that two successive Lorentz space/time transforms in the same direction, with 
velocities u' and v, are equivalent to a single transform with velocity u given by Eq. (25). 



9.3 . Photon with wavelength X is scattered by an electron, initially at rest. Considering the 
photon as an ultrarelativistic particle (with the rest mass m = 0), find wavelength X' of the scattered 
photon as a function of the scattering angle a - see Fig. below. 

P _.-*o 



i 



i 



a 



X' 

9.4 . Calculate the threshold energy of a f-photon for the reaction 



y + p — > p + 71°, 



if the proton was initially at rest. 



9.5 . A relativistic particle with energy and rest mass m collides with a similar particle, initially 
at rest in the laboratory frame. Find: 

(i) the final velocity of the center of mass of the system, in the lab frame, 

(ii) the total energy of the system, in the center-of-mass frame, and 
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(iii) the final velocities of both particles (in the lab frame), if they move along the same 
direction. 

9.6 . Static fields E and B are uniform but arbitrary (both in magnitude and in direction). What 
should be the velocity of an inertial reference frame to have the vectors E' and B', observed from that 
frame, parallel? Is this solution unique? 

9.1 . Each of two very thin, long, parallel beams of electrons of the same velocity u carries 
electric charge of density A, per unit length (as measured in the coordinate frame moving with electrons). 

(i) Calculate the distribution of the electric and magnetic fields in the system (outside the 
beams), as measured in the lab frame. 

(ii) Calculate the interaction force between the beams (per particle) and the resulting 
acceleration, both in the lab frame, and in the system moving with the electrons. Compare the results 
and give a brief discussion of the comparison. 

9.8 . Find the trajectory of a relativistic particle in a uniform electrostatic field E for the case of 
arbitrary direction of its initial velocity u(0). 

Hint: The reader is encouraged to explore ways of integration the equation of motion, different 
from the one used in Sec. 6 for case u(0) || E. 

9.9 . Analyze motion of a nonrelativistic particle in a region where the electric and magnetic 
fields are both constant and uniform, but not necessarily parallel or perpendicular to each other. 

9.10 . Find the law of motion of a relativistic particle in parallel, static electric and magnetic 

fields. 

Hint: You may like to use the proper time of the particle. 



9.11 . Consider the simple model of plane capacitor charging by a 
lumped current source, shown in Fig. on the right, and prove that the 
momentum given by the constant, uniform external magnetic field B to the 
current-carrying conductor is equal and opposite to the momentum of the 
electromagnetic field that current I(t) builds up in the capacitor. (You may let 
the capacitor be planar and very broad, and neglect the fringe field effects.) 

9.12 . Calculate the pressure imposed on well-conducting walls of a waveguide with rectangular 
(axb) cross-section by a wave propagating along it in the fundamental (//io) mode. Give an 
interpretation of the result. 
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Chapter 10. Radiation by Relativistic Charges 

In this chapter, we return to the electromagnetic wave radiation by moving charges, because the review 
of the special relativity background in the previous chapter enables an analysis of the radiation effects 
for arbitrary speed of the charged particle. After an analysis of such important particular cases as 
synchrotron radiation and "Bremsstrahlung" (brake radiation), we will discuss the apparently 
unrelated effect of Coulomb losses, which nevertheless will lead us to such important phenomena as the 
Cherenkov radiation and transitional radiation. In the end of the chapter, I will briefly review the 
effects of back action of the emitted radiation on the emitting particle, and the resulting limits of 
classical electrodynamics. 



10.1. Lienard-Wiechert potentials 

A convenient starting point for the discussion of radiation by relativistic moving charges is 
provided by Eqs. (8.17) for retarded potentials. In free space these formulas are reduced to 

flr,0 = — t /*r'.t-R/c) > A (r,0 = ^f J(r ^" ie/c W . (10.1) 
4oe 0 J R 4k ] R 

Here R is the magnitude of the vector, 

R = r-r', (10.2) 

that connects the source point r' to the observation point r. As a reminder, Eqs. (1) were derived from 
the Maxwell equations without any restrictions, and are very convenient for situations with continuous 
distribution of charge and current. On the other hand, for point charges, with delta-functional p and j, it 
is more convenient to recast these relations into a simpler form that would not require the integration 
over the r ' space. 

This reduction, however, requires care. Indeed, for a single point charge q moving with velocity 
u, such integration of Eqs. (1), if carried out naively, would yield the following apparent result: 

dfr,l)=^f.l*^ = ff;*(..«)=^, [WRONG!] (10.3) 
4tt£ 0 R r c 4k R r 4k R r 

where index r marks the variables to be calculated at time t - R,Jc. This is a good example how the 
science of relativity (even the special one :-) cannot be taken too lightly. Indeed, 4-vectors (9.84)-(9.85), 
formed from potentials (3), would not obey the Lorentz transform rule (9.91), because distance R r also 
depends on the reference frame it is measured in. 

In order to correct the error, we need, first of all, to specify what exactly is R r for a point charge. 
Evidently, in this case, only one space-time point {r', t'} may contribute to integrals (1) for any 
observation point {r, t} . The point should be found from the retardation condition t' = t- R,Jc, i.e. 

c(t-t') = \r(t)-r'(t')\. (10.4) 

Figure 1 depicts the graphical solution of this self-consistency equation as the point of intersection of 
the light cone of the observation point (see Fig. 9.9 and its discussion) and the trajectory of the charged 
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particle. 1 As in Eq. (3), I will use index r to mark all variables corresponding to the retarded point {r ', 
t '} that satisfies Eq. (4); for example, t' = t r , c{t - t r ) = R r (see Fig. 1), u{r ', t r ) = u r , etc, as measured in 
the "lab" reference frame - generally, any inertial frame that moves with the same velocity as the 
observation point at the moment t we are considering. 




Fig. 10.1. Graphical solution of Eq. (4). 



Now let us write Eqs. (1) for a point charge in another inertial reference frame 0 ', whose velocity 
(as measured in the lab frame) coincides, at moment t r , with the same velocity (u r ) of the point charge. 
In that frame the charge rests, so that 

<P' = -^—, A' = 0, (10.5) 
4ns 0 R' 

but let us remember that this R ' may not be equal to R, because the latter distance is measured in the 
"lab" reference frame. Let us use the identity l/s 0 = /Jqc to rewrite Eq. (5) in the form of components of 
a 4-vector similar in structure to Eq. (3): 

l = ^g^, A' = 0. (10.6) 
c An R' 

Now it is easy to guess the correct answer for the whole 4-potential: 

^ B =7^-^r> (10.7) 
4n u fl R /j 

where (just as a reminder), A a = {</>lc, A}, u a = y{c, u},and R a is a 4-vector of the event distance, formed 
similarly to that of a single event - cf. Eq. (9.48): 

R a ={c(t-t'),R} = {c(t-t'),r-r'}. (10.8) 

Indeed, we need the 4-vector A a that would: 

(i) obey the Lorentz transform, 

(ii) have its spatial components Aj scaling and 

(iii) be reduced to the correct result (5) in the reference frame moving with the charge. 



1 As Fig. 1 shows, there is always another point {r", t" }, with t" > t, that is formally also a solution of Eq. (4), 
but it does not fit Eqs. (1), because the field induced at that point would violate the causality principle. 
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Formula (7) evidently satisfies all these requirements, because the scalar product in its denominator is 
just 

u p R p = y{c,-u}- {c(t - 1'\ R} = y[c 2 (t - f) - u • R] = yc(R - p R) = ycR(\ - p n) , (10.9) 
where n = K/R is a unit vector in the observer's direction, p = u/c is the normalized velocity of the 

2 2 1/2 

particle, and y = 1/(1- u/c) . 2 In the reference frame of the charge (in which p = 0 and y= 1), 
expression (9) is reduced to cR, so that Eq. (7) is correctly reduced to Eq. (6). Now let us spell out 
components of Eq. (7) in the lab frame (in which t' = t r and R = R r ): 



Lienard- 
Wiechert 
potentials 



<P( r >t) = TZ 



l 



4ns Q (R - p • R) 



4lZS n 



1 



4x 



u 



i?-P R 



Mo 
—^-qc 

4n 



P 



tf(l-p-n) 



i?(l-p-n) 



(10.10a) 
(10.10b) 



These formulas are called the Lienard-Wiechert potentials? In the nonrelativistic limit, they 
coincide with the naive guess (3), but in the general case include the additional factor (1 - p-n) in the 
denominator, which describes the apparent increase of the effective charge density of the source due to 
the apparent change of distance R, at /?~ 1. In order to understand its origin, let us consider a simple ID 
model of the radiation: a uniformly charged rod, of length /, moving directly toward an observer located 
at point r, with a constant speed u (Fig. 2). As a result of this motion, the observer may measure the 
field (1) induced by the rod, within a certain time interval j7 st art, 4t op ]- 



Of * 

u ' 1 stop 



0,t' 



stop 



<- 



u 



c(t , -t' t ) 

V^stop 1 stop/ 



fl 1 



start T 



u 



U (f start ^stop) / C (^start ^ start) 



r,t. 



stop 



Ms, 



Fig. 10.2. Geometric effect behind 
factor (1 - p-n) in the Lienard-Wiechert 
potentials. 



That trailing end of this field pulse, observed at t = t stop , is emitted by the far (in Fig. 2, leftmost) 
end of the rod at moment t ' stop . Due to the limited speed of the rod, u < c, the moment t ' stop comes earlier 
than the moment t at which the front end of the rod emits the field that starts the observed pulse. 
During the positive time interval (t' sta n - t' st0 p), the rod passes an additional distance u(t' sta n - t 'stop) - see 
the bottom panel of Fig. 2. Using the evident relations shown on each of the two panels of Fig. 2 to 
express r, and requiring them to give the same result, we get the following relation 

C('stop - ''stop ) = "(''start ~ ' 'stop ) + I + C(t M ~ t'^ ) . (10.11) 



2 Note the following identities: y 2 = 1/(1- J3 2 ) and (y 2 - 1) = /? 2 /(l- p 2 ) = fft, which may be very handy for the 
relativity-related algebra. 

3 They were derived in 1898 by A.-M. Lienard and (apparently, independently) in 1900 by E. Wiechert. 
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Using it to express the difference At'(u) = t 'start - ^start > 0 in the limit when t stop — > 4tart, i.e. when the 
observed radiation pulse is short, we get 

At > (u) = J_ = ll^ = ^l, wh e r e At'(0) = -, (10.12) 
c-u 1- p 1- p c 

is a factor of 1/(1 - ft) smaller than what is would be at negligible source speed. Hence the time interval 
between the retarded moments t r for two ends of the rod is compressed as u is increased. Since the total 
charge of the rod does not depend on u, its linear charge density is increased, and the field in the 
observation point is increased accordingly. Somewhat counter- intuitively, Eq. (12) shows that this field 
re-normalization is independent of the source size /, and hence takes place even in the limit / — > 0, e.g., 
for a point source. 4 

So, the 4-vector formalism has provided a big help for the calculation of field potentials. Now, 
the electric and magnetic field corresponding to the potentials may be found by the plugging Eqs. (10) 
into the general formulas (6.106). This operation should be also performed very carefully, because Eqs. 
(6.106) require the differentiation over the coordinates {r, t) of the observation point, while we want the 
fields to be expressed via particle's velocity u v = (dv'ldt'\ that participates in Eqs. (10). In order to find 
the relation between derivatives over t and t', let us differentiate Eq. (4), rewritten as 

R r =c(t-t r ), (10.13) 
over t and t r . In order to calculate derivative dR r ldt r , let us first differentiate identity R 2 = RR: 

2R r ^ = 2R„. 8 ^. (10.14) 

dt, ' 8t„ 



Since 8R r /dt r = d(r-r ')ldt r = -dr 'ldt r = -u, Eq. (14) yields 

dR„ K, 8K 



= -(n-u) r . (10.15) 



Now let us differentiate the same function i? r over t, keeping r fixed. On one hand, Eq. (13) yields 

* = c-c^. (10.16) 

dt dt 

On the other hand, according to Eq. (4), if r is fixed, t' is a function of t alone, so that, using Eq. (15), 
we may write 

^ = ^ = -(n-u)A (10-17) 

dt dt r dt v ,r dt 

Requiring Eqs. (16) and (17) to give the same result, we get the same factor that participates in the 
Lienard-Wiechert potentials (10) and Eq. (12): 



4 Note that this time compression effect (linear in p) has nothing to do with the Lorentz time dilation (9.21), 
which is quadratic in /?. (Indeed, all our arguments above are referred to the same, lab frame.) Rather, it is close in 
nature to the Doppler effect. 
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dt 



■-(n-u), 



1-p-nJ, 



(10.18) 



This relation may be readily interpreted - at least semi-quantitatively. At fixed r, variation dt of 
the observation time corresponds to a small vertical shift of the light cone in Fig. 2, while dt r is the 
corresponding shift of the retarded time t r , i.e. of the point where the world line r\t') crosses the light 
cone at the observation point r(t). It is evident from that figure that if the particle does not move (i.e. its 
world trajectory in a vertical straight line), then dt r = dt. On the other hand, if the particles moves fast 
(with speed u « c) toward the observation point, its world line crosses the light cone at a small 
("grazing") angle, so that dt,- » dt, in accordance with Eq. (18). 

Since the retarded time t r , as the solution of Eq. (3), depends not only on the observation time t, 
but also the observation point r, so we also need to calculate its spatial derivative - the gradient in r- 
space. A calculation, absolutely similar to that carried above, yields 



Vt = 



n 



(10.19) 



Using Eqs. (6.106), (18) and (19), the calculation of fields from Eqs. (10) is straightforward but 
tedious, and is left for reader's exercise. For the electric field, the result is: 



Electric 
and 



E 



n-p nx{(n-p)xp 

7 2 (l-p-n) 3 i? 2 (l-p-n) 3 ci? 



(10.20a) 



magnetic The only good news about this uncomfortably bulky result is that a similar differentiation gives 



f essentially the same formula for the magnetic field, which may be expressed via Eq. (20a): 5 



relativistic 
particle 



E 




B = n, x — , 


i.e. H = — n xE . 


c 


Z 



(10.20b) 



Thus the magnetic and electric fields are always perpendicular to each other, and related just as in a 
plane wave - cf. Eq. (7.6), 6 with the only difference that now vector n r may be a function of time. 

As a sanity check, let us use Eq. (20a) as an alternative way to find the electric field of a charge 
moving without acceleration, i.e. uniformly, along a straight line - see Fig. 9.11 (reproduced in Fig. 3) 
and its discussion in Sec. 5. (This example will also exhibit the challenges of practical application of the 
Lienard-Wiechert formulas.) In this case vector P does not change in time, so that the second term in Eq. 
(20a) vanishes, and all we need to do is to spell out the Cartesian components of the first term. Let us 
select the coordinate axes and time origin in the same way as shown in Fig. 3, and make a clear 
distinction between the actual position, r' (t) = {ut, 0, 0} of the charged particle at the instant t we are 



5 An alternative way to derive Eqs. (20) is to plug the 4-vector of potentials, given by Eq. (7), into Eq. (9.124) to 
calculate the field strength tensor. This calculation yields 



F 



a/3 _ 



ju 0 q 1 



An u y R r dr 



R a u p -R p u 
UsR* 



P., a 



Now the elements of this tensor may be identified with fields components in accordance with Eq. (9.125). 

6 Superficially, Eq. (20b) contradicts the electrostatics where B should vanish while E stays finite. However, note 

that according to the Coulomb law for a point charge, in this case E = En = En r , so that B cc n, xE cc n,xn, = 0. 
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considering, and its retarded position r'(t r ), where t r is the solution of Eq. (13), i.e. the moment when the 
particle's field, moving with the speed of light, reaches the observation point r. In these coordinates 

p = {/?,0,0}, r = {0,0,6}, r'(t r ) = {ut r ,0,0}, n r = {cos0, 0, sin#}, (10.21) 

with cos6> = -ut'lRr, so that [(n - fi) x ] r = -ut'/R r - /?, and for the longitudinal component of the electric 
field, Eq. (20a) yields 




Fig. 10.3. Geometry of the linearly 
moving charge problem. 



ut r u(t-t r ) 



-ut r /R-fi 

r 2 (i-p-n) 3 * 5 



-ut r -jSR 
r 2 (l-p-n) 3 /? 3 



(10.22) 



But according to Eq. (13), product f5R r may be presented as f3c(t - t r ) = u(t - t r ). Plugging this 
expression into Eq. (22), we may eliminate the explicit dependence of E x on time t ': 



E r =■ 



q 



- ut 



4tts 0 r 2 [(l-p.n)i?] 3 ' 
The nonvanishing transversal component of the field also has a similar form: 



(10.23) 



E =S- 



sin# 



3 D 2 



q 



4ne 0 y 2 [(\-V-n)Rl 



(10.24) 



while E z = 0. Hence, the only combination of t r and R r we still need to calculate is [(1 - P-n)i?] r . From 
Fig. 3, p-n r = /fcostf = -put'lRr, so that (1 - p-n)i? r = R r + put r = c(t - t r ) + cftt r = ct - ct r /y 2 . What 
remains is to find time t r from the self-consistency equation (13) that in our case (Fig. 3) takes the form 



R 2 r = c \t-t r ) 2 =b 2 +(ut r ) : 



(10.25) 



After solving this quadratic equation (with the appropriate negative sign before the square root, in order 
to get t r < t), 



t r =r 2 t-[(r 2 t) 2 -r 2 (t 2 -b 2 /c 2 ) 

we obtain a simple result: 



1/2 



= y-t 



2+ Y ( 2 2.2 , r 2 

[u y t +b J 



[(\-V.n)Rl=^(u 2 y 2 t 2 + b 2 f 2 , 

r 



(10.26) 



(10.27) 



so that the electric field components are 
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^=-—1 r ^JY • ^=—7 ~ \37T' E z =0. (10.28) 

4.77-P / l2 . 2 2,2 V' 2 > 4.57-p (.2 . 2 2.2\ 3/2 

^^ 6 o +y u t J [0 +y u t J 

These are exactly Eqs. (9.139), 7 which had been obtained in Sec. 9.5 by simpler means, without 
the necessity to solve the self-consistency equation for t r . However, that alternative approach was 
essentially based on the inertial motion of the particle, and cannot be used in problems in which particle 
moves with acceleration. In those problems, the second term in Eq. (20a), describing wave radiation, is 
essential and most important. 



10.2. Radiation power 

Let us calculate the angular distribution of particle's radiation. For that, we need to return to use 
Eqs. (20) to find the Poynting vector S = ExH, and in particular its component S n = S-n r , at large 
distances R from the particle. Following tradition, let us express the result as the radiated energy per 
unit solid angle per unit time interval dt r of the radiation (rather than its measurement), using Eq. (18): 



dP 
dQ. 



d£ 



dQdt, 



R 2 S 



" dt. 



[i? 2 (ExH)-n(l-p-n)] ; . 



(10.29) 



At sufficiently large distances from the particle, i.e. in the limit R — » 00, the contribution of the first 
(essentially, the Coulomb field) term in the square brackets of Eq. (20a) vanishes as 1/R 2 , so that we get 
a key formula valid for an arbitrary law of particle motion: 8 



Angular 
density of 
radiation 
power 



d-P _ Z 0 q 2 


nx[(n-p)xp 


2 


dQ {An) 2 (1-n-p) 5 



(10.30) 



Now, let us apply this important result to some simple cases. First of all, Eq. (30) says that a 
charge moving with constant velocity p does not radiate at all. This might be expected from our analysis 
of this case in Sec. 9.5, because in the reference frame moving with the charge it produces only the 
Coulomb electrostatic field, i.e. no radiation. 

Next, let us consider a linear motion of a point charge with a nonvanishing acceleration - 
evidently directed along the motion line. With the coordinate axes directed as shown in Fig. 4a, each of 
the vectors involved in Eq. (30) has at most two nonvanishing Cartesian components: 



n = {sin 6, 0, cos 0\ p = {0, 0, /?}, p = {o, 0, /?} . 



(10.31) 



where 6 is the angle between the directions of particle's motion and radiation propagation. Plugging 
these expressions into Eq. (30) and performing the vector multiplications, we get 



d-P 
~dQ 



Z 0 q z 



■fi 



sin 2 6> 



{An) 1 (l-/?cos#y 



(10.32) 



7 A similar calculation of magnetic field components from Eq. (20b) gives the results identical to Eqs. (9.140). 

8 If the direction of radiation, n, does not change in time, this formula does not contain the observation point r. 
Hence, from this point on, index r may be safely dropped for brevity, though we should always remember that p 
in Eq. (30) is the reduced velocity of the particle at the instant of radiation's emission, not detection. 
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(a) 
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Fig. 10.4. Radiation at linear 
acceleration: (a) geometry of 
the problem, and (b) the last 
fraction in Eq. (32) as a 
function of angle 9. 



Figure 4b shows the angular distribution of this radiation, for three values of particle's speed. If 
it is relatively low (J3« 1), the denominator in Eq. (32) is close to 1 for all observation angles 0, so that 
the angular distribution of the radiation power is close to sin 0 - just as it follows from the general 
nonrelativistic formula (8.26). However, as the velocity is increased, the denominator is less than 1 for 
0 < xl2, i.e. for the forward-looking directions, and is larger than 1 for back directions. As a result, the 
radiation toward particle's velocity is increased (somewhat counter-intuitively, regardless of the 
acceleration sign!), while that in the back direction is suppressed. For ultrarelativistic particles (J3 — > 1), 
this trend is enormously exacerbated, and radiation to very small forward angles dominates. In order to 
describe this main part of the distribution, we may expand the trigonometric functions of 6, participating 
in Eq. (32), into the Taylor series in small 9, and keep only their leading terms: sin# « 6, cos#« 1 - 6 
2 12, so that (1 - j3cos&) « (1 + y 2 9 2 )l2y 2 . The resulting expression, 

ail x (\ + y v ) 

describes a narrow distribution of radiation, with a maximum at angle 

0 O = — «1. (10.34) 

2y 

Note that due to the axial symmetry of the result, and the fact that according to Eq. (33), dffdD. = 0 in 

the exact direction of particle's propagation (0=0), Eq. (40) describes a narrow circular "hollow cone" 
of radiation. Another important aspect of this result is how fast does the maximum radiation brightness 
grows with the Lorentz factor y, i.e. with particle's energy 3 = ymc . 

Still, the total radiated power T 3 (into all observation angles) at linear acceleration is not too high 
for any practicable values of parameters. In order to show this, it is convenient to calculate "P for an 
arbitrary motion of the particle first. It is possible to do this by a straightforward integration of Eq. (30) 
over the full solid angle, but let me demonstrate how V may be found (or rather guessed) from the 
general relativistic arguments. In Sec. 8.2, we have derived Eq. (8.27) for the electric dipole radiation 
for nonrelativistic particle motion. That result is valid, in particular, for one charged particle whose 
electric dipole moment's derivative over time may be expressed as d(qr)/dt = (q/m)p, where p is 
particle's mechanical momentum (not its electric dipole moment). As the result, Eq. (8.27) (in free 
space, i.e. with v = c) reduces to 
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P = 



z 0 


' q dp^ 
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-7 2 

Z 0 q 


f dp 




67TC 2 


\m dt j 




67ml 2 c 2 


V dt 


dt , 



(10.35) 



This is evidently not a Lorentz-invariant result, but it gives a clear hint how such an invariant, that is 
reduced to Eq. (35) in the nonrelativistic limit, may be formed: 



P 



Z 0 q 



2 2 



67m c 



dp a dp 



a \ 



dr dr 



7 2 
Z 0 q 



2 2 



671m c 



dp 

\dr j 



1 



U^ 1 



\dr j 



(10.36) 



Plugging in the relativistic expressions, p = ymc$, £ = ymc , and dr = dtly, the last formula may 
be recast in the form 



p=^-f 

6n 



(p) 2 -(p*p) 2 



(10.37) 



that may be also obtained by a direct integration of Eq. (30), confirming our guess. However, for most 
applications, it is beneficial to express P the via the time evolution of particle's momentum alone. For 
that, we may differentiate the fundamental relativistic relation (9.78), 2 = (mc 2 ) 2 + (pc) 2 , over the 
proper time r to get 



d& 2 dp 
lc> — = 2c p — 
dr dr 



d& c p dp dp 
i.e. — = — - — — = u — 
dr & dz dr 



(10.38) 



where, at the last transition, the magnitude of the relativistic vector relation mentioned in Chapter 9, 
c 2 pl& = u, has been used. Plugging this relation into Eq. (36), we may rewrite it as 



Total 
radiation 
power 



P = 



Z 0 q : 



67tm 2 c 2 



dp 

dr 



'dp} 2 
dr 



(10.39) 



Note the difference between the squared derivatives in this expression: in the first of them we have to 
differentiate the momentum vector p, and only then form a scalar by squaring the resulting vector 
derivative, while in the second case, only the magnitude of the vector is differentiated. For example, for 
a circular motion with constant speed (to be analyzed in detail in the next section), the second term is 
zero, while the first one is not. 

2 

However, if we return to the simplest case of linear acceleration (Fig. 4), then (dpldr) = 

2 

{dpldr) , and Eq. (39) is reduced to 



P = 



Z 0 q 



671m c 



2 „2 



r dpy 

dr 



67nn c 



2 „2 



'dp} 2 

dr 



Z 0 q z 



r 



671m c 



2 „2 



'dp} 2 

dt' 



(10.40) 



(where t' = t r is the time of emitting radiation as measured as in the lab frame), i.e. formally coincides 
with nonrelativistic Eq. (35). In order to get a better feeling of the magnitude of this radiation, we may 
use the fact that dpldt = d£/dz '. This allows us to rewrite Eq. (40) in the following form: 



P = 



Z 0 q 2 



671m 2 c 2 



d^ 
dz 



Z 0 q 2 d& d£ dt' Z 0 q 2 d£ d£ 



67nn 2 c 2 dz' dt' dz' 67nn z c z u dz' dt' 



2 2 



(10.41) 
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For the most important case of ultrarelativistic motion (u — » c), this result may be presented as 

-P ^I djtlmc 2 ) 
dtldt' ~ 3 d(z'/rj ' 



(10.42) 



where r c is the classical radius of the particle, given by Eq. (8.41). This formula shows that the radiated 
power, i.e. the change of particle's energy due to radiation, is much smaller than that due to the 
accelerating field, unless energy as large as mc is gained on the classical radius of the particle. For 
example, for an electron, such acceleration would require the accelerating electric field of the order of 
(0.5 MV)/(3xl0" 15 m) ~ 10 14 MV/m, while practicable accelerating fields are below 10 3 MV/m, limited 
by the electric breakdown effects. Such smallness of radiative losses of energy is actually a large 
advantage of linear electron accelerators - such as the famous 2-mile-long SLAC 9 that can accelerate 
electrons or positrons to energies up to 50 GeV, i.e. to y~ 10 5 . 



10.3. Synchrotron radiation 

Now let me show that in circular accelerators, the radiation is much larger. Consider a charged 
particle being accelerated in the direction perpendicular to its velocity u (for example by a the magnetic 
component of the Lorentz force), so that its speed u, and hence the magnitude p of its momentum, do not 
change. In this case, the second term in Eq. (39) vanishes, and it yields 



7> = - 



Z 0 q 
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y 2 

Z 0 q 




67un 2 c 2 


ydT J 




671m 2 c 2 


ydt') 



7 



(10.43) 



Comparing this expression with Eq. (40), we see that for the same acceleration magnitude, the 
electromagnetic radiation is a factor of y 2 larger. For modern accelerators, with y~ 10 4 -10 5 , such a factor 
creates an enormous difference. For example, if a particle is on a cyclotron orbit in a constant magnetic 
field (as was analyzed in Sec. 9.6), both u and p = ymu obey Eq. (9.150), so that 



dp 

df 



= co cP = -p = P y R 



(where R is orbit's radius), so that for the power of this synchrotron radiation, Eq. (43) yields 




(10.44) 



Synchrotron 
(10.45) radiation 
power 



According to Eq. (9.153), at fixed magnetic field (in particle accelerators, limited to a few Tesla 
produced by the beam-bending magnets), the synchrotron orbit radius R scales as y, so that according to 
Eq. (45), "^scales as y, i.e. grows fast with particle's energy 3 oc y. For example, for typical parameters 
of the first electron cyclotrons (such as the General Electric machine in which the synchrotron radiation 
was first noticed in 1947), R ~ 1 m, 3 ~ 0.3 GeV (y~ 600), Eq. (45) gives a very modest electron energy 
loss per one revolution: "PAt' « IjfPRIc ~ 1 keV. However, already by the mid-1970s, electron 
accelerators, with R ~ 100 m, have reached energies 3 ~10 GeV, and the energy loss per revolution has 



9 See, e.g., https://www6.slac.stanford.edu/ . 
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grown to ~ 10 MeV, becoming the major energy loss mechanism. 10 However, what is bad for particle 
accelerators and storage rings is good for the so-called synchrotron light sources - the electron 
accelerators designed specially for the generation of intensive synchrotron radiation - with the spectrum 
extending well beyond the visible light range. Let us now analyze the angular and spectral distributions 
of such radiation. 

To calculate the angular distribution, let us select the coordinate axes as shown in Fig. 5, with 
the origin at the current location of the orbiting particle, axis z along its instant velocity (i.e. vector P), 
and axis x toward the orbit center. 



Ay 




Fig. 10.5. Geometry of the synchrotron 
radiation problem. 



In the general case, the unit vector n toward the radiation observer is not within any of the 
coordinate planes, and hence should be described by two angles - the polar angle 9 and the azimuthal 
angle q> between the x axis and projection OP of vector n on plane [x, y\. Since the length of segment OP 
is shift the Cartesian coordinates of the relevant vectors are as follows: 



n = {sin 6 cos <p, sin 6 sin q>, cos 6\, P = {0, 0,/?}, (j = {/?, 0, 0 
Plugging these coordinates into the general Eq. (30), we get 



(10.46) 



Synchrotron 
radiation' 
angular 
distribution 



d-p 



2Z 0 q z 



K 



pV/(M, with/(M s 



i 



8^ 6 (l-/?cos#) 3 



sin 2 fS'cos 2 q> 
y 2 (l-ficos0) 2 



(10.47) 



According to this result, just as at the linear acceleration, in the ultrarelativistic limit, most 
radiation goes to a narrow cone (of width AO- y « 1) around vector p\ i.e. around the instant direction 
of particle's propagation. For such small angles, and y» 1, the second of Eqs. (47) is reduced to 



1 



2x3 



Ay 0 cos q> 



(l + y 2 0 2 ) 2 



(10.48) 



Left panel of Fig. 6 shows the angular distribution j{0, cp) color-coded, on the plane 
perpendicular to particle's instant velocity (in Fig. 5, plane [x, y]), while its right panel shows the 
intensity as a function of 9 in two perpendicular directions: within the particle rotation plane (along axis 



10 For proton accelerators, such energy loss is much less of a problem, because y of an ultrarelativistic particle (at 
fixed &) is proportional to Vm, so that the estimates, at the same R, should be scaled back by (m p /m e ) 4 ~ 10 13 . 

Nevertheless, in the giant modern accelerators such as the LHC (with R « 4.3 km and 3 « 7 TeV), the synchrotron 
radiation loss per revolution is rather noticeable {PAt ' ~ 6 keV), leading not as much to particle deceleration as to 
substantial photoelectron emission from the beam tube walls, creating harmful defocusing effects. 
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x) and perpendicular to this plane (along axis y). The result shows, first of all, that, in contrast to the 
case of linear acceleration, the narrow radiation cone is now not hollow: the intensity maximum is 
reached exactly at 0= 0, i.e. in particle's motion direction. Second, the radiation cone is not axially- 
symmetric: the intensity drops faster within the particle rotation plane (and even has nodes at 6 = +l/y). 
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Fig. 10.6. Angular distribution of 
the synchrotron radiation at y » 1 . 



Let us consider the time/frequency structure of the synchrotron radiation, now from the point of 
view of the observer rather than the particle itself. (In the latter picture, due to the axial symmetry of the 
problem, the total radiation power "P is evidently constant.) Its semi-quantitative picture may be 
obtained from the angular distribution we have just analyzed. Indeed, if an ultrarelativistic particle's 
radiation is observed from a point in (or close to) the rotation plane, 11 the observer is being "struck" by 
the narrow radiation cone once each rotation period, each "strike" giving a pulse of a short duration At 
« co c - see Fig. 7. 




r'(0 r'(t 2 ) 




Fig. 10.7. (a) Synchrotron radiation cones at y» 1, and (b) the in-plane component of their electric field, 
observed in the rotation plane, as a function of observation time t - schematically. 



11 If the observation point is off-plane, or if the rotation speed is much less than c, the radiation is virtually 
monochromatic, with frequency co c . 
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The evaluation of the time duration At of each pulse requires some care: its estimate At' ~ \lyco c 
is correct for the duration of the time of particle's motion while its cone is aimed at the observer. 
However, due to the time compression effect, discussed in detail in Sec. 1 and described by Eqs. (12) 
and (18), the pulse duration as seen by observer is a factor of 1/(1 - ft) shorter, so that 

A* = (1-/?)A*'~ — ■ (10.49) 

ya c y co c 

From the Fourier theorem, we can expect that the frequency spectrum of the radiation consists of 
numerous (N ~ y 3 » 1) harmonics of the rotation frequency a> c , with comparable amplitudes. However, 
if the orbital frequency fluctuates even slightly (da> c la) c > l/N ~ 1//), as it happens in most practical 
systems, the radiation pulses are not coherent, so that the average radiation power spectrum may be 
calculated as that of one pulse, multiplied by number of pulses per second. In this case, the spectrum is 
continuous, extending from low frequencies all the way to approximately 

<y max ~i/A;~ r V (io.50) 

In order to verify this estimate, let us calculate the spectrum of radiation, due to a single pulse. 
For that, we should first make the general notion of spectrum quantitative. Let us present an arbitrary 
electric field (say that of the synchrotron radiation we are studying now), considered as a function of the 
observation time t (at fixed r), as a Fourier integral: 12 



E(0= \^ m e~ iat dt. (10.51) 



This expression may be plugged into the following formula for the total energy of the radiation pulse 
(i.e. of particle energy's loss) per unit solid angle: 13 



= \S n (t)R 2 dt = — J|E(f)| 2 <fc. (10.52) 

" -co 0 —CO 

This substitution, plus a natural change of integration order, yield 

^^Idalda/E^ldte-*'™*'. (10.53) 

0 -co -co -co 

But the inner integral (over t) is just 2nb\m + co'). 14 This delta- function kills one of the frequency 
integrals (say, one over co'), and Eq. (53) gives a result which may be recast as 



12 In contrast to the single-frequency case (i.e. a monochromatic wave), we may avoid taking real part of the 
complex function (E ffl e~" u< ) if we require that E. ffl = E ffl *. However, it is important to remember the factor l A 
required for the transition to a monochromatic wave of frequency coq. E ffl = E 0 [S[a>- coq) + d(co + fflb)]/2. 

13 Note that the expression under the integral differs from d'PldQ. defined by Eq. (29) by the absence of term (1 - 

P-n) = dt '/dt. This is natural, because this is the wave energy arriving at the observation point r during time 
interval dt rather than dt '. 

14 See, e.g. MA Eq. ( 14.3a). 
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= \l(co)da>, with l(p)= E OT -E.^ = AttZ^HcR) 2 ^^ 

J 7 



* 



(10.54) 



where the evident frequency symmetry of the scalar product E^E-a, has been utilized to fold the integral 
of I{co) to positive frequencies only. The first of Eqs. (51) and the first of Eqs. (54) make the physical 
sense of function I(cd) clear: this is the so-called spectral density of the electromagnetic radiation (per 
unit solid angle, per unit pulse). 15 

In order to calculate the spectral density, we need to express function ~E m via ~E(f) using the 
Fourier transform reciprocal to Eq. (51): 



E 



CO 



-I +0) 

= j-\K(t)e imt dt. 



(10.55) 



In the particular case of radiation by a single point charge, we should use the second term of Eq. (20a): 



1 q 1 + pnx{(n-p)xp} J(rt 



2k 4ks 0 cR 
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(10.56) 



Since vectors n and p are natural functions of the radiation (retarded) time t\ let us use Eqs. (18) to 
change integration in Eq. (52) from the observation time t to time t'\ 
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(10.57) 



The strong inequality R r » r' that is implied from the beginning of this section allows us to consider the 
unit vector n as constant and, moreover, to use approximation (8.19) to reduce Eq. (57) to 
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(10.58) 



Plugging this expression into Eq. (54), we get 16 



Z 0 q z 
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7nx{(n-p)xp 
i (1-Pn) 2 



exp< ico 
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n r' 



v 



(10.59) 



Let me remind the reader that p inside this integral is supposed to be taken at the retarded point 
{r ', t'}, so that Eq. (59) is fully sufficient for finding the spectral density from the law r'(t') of particle's 
motion. However, this result may be further simplified by noticing that the fraction before the exponent 
may be presented as a full derivative over t ', 



15 The notion of spectral density may be readily generalized to random processes - see, e.g., SM Sec. 5.4. 

16 Note that for our current purposes of calculation of spectral density of radiation by a single particle, factor 
exp{icor/c} has got cancelled. However, as we have seen in Chapter 8, this factor plays the central role at 
interference of radiation from several (many) sources. In the context of synchrotron radiation, such interference 
becomes important in undulators and free-electron lasers - the devices to be (qualitatively) discussed below. 
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nx{(n-p)xrfp/rff}_ d 



(1-p-nr 



d? 



nx(nx p) 
1-P n 



(10.60) 



and working out the resulting integral by parts. At this operation, the time differentiation of the 
parentheses in the exponent, d(t' - n-r ' I c)ldt' = 1 - n-u/c = 1 - p-n, leads to the cancellation of 
denominator's remains and hence to a surprisingly simple result: 17 



/(«) = 



ry 2 2 
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(10.61) 



Returning to the particular case of synchrotron radiation, it is beneficial to choose the origin of 
time t' so that at t' = 0, angle 6 takes its smallest value 6b, i.e., in terms of Fig. 5, vector n is within 
plane [y, z]. Fixing this direction of axes in time, we can redraw that figure as shown in Fig. 7. In these 
coordinates, 

n = {0, sin 0 O , cos 0 O }, r' = (i?(l-cos«), 0, Rsina}, p = {/?sin«, 0,/?cos«}, (10.62) 
where a = co c t ', and an easy multiplication yields 

nx(n x p) = y^jsina, sin^ cos^,, cos«, -sin 2 0 0 sin«}, (10.63) 



exp< icd ? 
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Fig. 10.7. Deriving the spectral density of 
synchrotron radiation. Vector n is fixed in 
plane \y, z], while vectors r'(t') and P(?') 
rotate in plane [x, y] with angular velocity co c . 



As we already know, in the (most interesting) ultrarelativistic limit y » 1 , most radiation is 
confined to short pulses, so that only small angles a ~ co c At' ~ y' 1 may contribute to the integral in Eq. 
(61). Moreover, since most radiation goes to small angles 0 ~ y'\ it makes sense to consider only small 
angles Oo ~ y' x « 1. Expanding both trigonometric functions of these small angles, participating in 
parentheses of Eq. (64), into Taylor series, and keeping only terms up to 0(y " 3 ), we can present them as 



? cos 61, sin« 



, R , R&o , 
? 0)J' + -co c t' + 

c c 2 



Kcot 



(10.65) 



17 Actually, this simplification is not occasional. According to Eq. (10b), the expression under the derivative is 
just the transversal component of the vector-potential A (give or take a constant factor), and from the discussion 
in Sec. 8.2 we know that this component determines the electric dipole radiation of the particle (which dominates 
the radiation field in our current case of uncompensated electric charge). 
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Since {Rlc)co c = ulc = fi~ 1, in two last terms we may approximate this parameter by 1. However, it is 
crucial to distinguish the difference of two first terms, proportional to (1 - J3)t', from zero, and as we 
have done before we may approximate it with t'lly 2 . In Eq. (63), which does not have such critical 
differences, we may be more bold, taking 18 



nx(nxp)*{a,0 o ,O}= W/,0 O ,OL 



As a result, Eq. (61) is reduced to 




(10.66) 



(10.67) 



where a x and a y are the dimensionless factors, 
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(10.68) 



which describe the frequency spectra of two components of the synchrotron radiation, with mutually 
perpendicular directions of polarization. Defining a dimensionless parameter 



v = 
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3co„ 



{el+y- 2 



r. 



(10.69) 



proportional to the observation frequency, and changing the integration variable to co c t'l{e§ + y 2 ) 112 , 
integrals (68) may be reduced to the modified Bessel functions of the second kind: 
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(10.70) 



Figure 8a shows the dependence of amplitudes a x and a y of the normalized observation frequency 
v. It is clear that the in-plane component, proportional to a x , is larger. (The off-plane component 
disappears altogether at #0 = 0, i.e. at observation within the particle rotation plane [x, y], due to the 
evident mirror symmetry of the problem about the plane.) It is also clear that the spectrum changes 
rather slowly (note the log-log scale of the plot!) until the normalized frequency, defined by Eq. (69), 
reaches ~1 . For most important observation angles 6b ~ y this means that our estimate (50) is indeed 
correct, though theoretically the frequency spectrum extends to infinity. 19 



18 By the way, this expression shows that the in-plane (x) component of the electric field is an odd function of t' 
(and hence t - see its sketch in Fig. 7), while the perpendicular component is an even function of time. Also notice 
that for an observer exactly in the rotation plane (#0 = 0) the latter component vanishes. 

19 The law of the spectral density decrease at large v may be readily obtained from the second of Eqs. (2.158) 
which is valid even for any (even non-integer) Bessel function index n: a x cc a v cc v" 1/2 exp{- v). Here the 
exponential factor is certainly most important. 
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(a) (b) 




Fig. 10.8. Synchrotron radiation frequency spectra of: (a) two polarization 
amplitudes and (b) the total (polarization- and angle-averaged) radiation. 



Naturally, a similar frequency behavior is valid for the spectral density integrated over the full 
solid angle. Without performing the integration, 20 let me give the result (also valid for y» 1 only) for 
reader's reference: 

U(m)dQ. = ^q 2 rdK 5l ^, C = ■ (10.71) 

L 4x J 3co c y 

Figure 8b shows the dependence of this integral on the normalized frequency £ (This plot is sometimes 
called the "universal flux curve".) In accordance with estimate (50), it reaches maximum at 

C ax -0.3, i. e .« max *^ r 3 . (10.72) 

For the new National Synchrotron Light Source (NSLS-II), that is under construction in the 
Brookhaven National Laboratory very close to our campus, with the ring circumference of 792 m, the 
electron revolution period Twill be 2.64 ps. Calculating co c as InlT « 2.4xl0 6 s" , for the planned y ~ 
6xl0 3 (<£ ~ 3 GeV), 21 we get <2W ~ 3xl0 17 s" , corresponding to photon energy /z<2W ~ 200 eV, 
corresponding to soft X-rays. In the light of this estimate, the reader may be surprised by Fig. 9 that 
shows the projected spectra of radiation which this facility is designed to produce, with maximum 
photon energies up to a few keV. 

The reason of this discrepancy is that in NLLS-II, and in all modern synchrotron light sources, 
most radiation is produced not by the circular orbit itself, but rather using special devices inserted into 
the electron beam path. These devices include bend magnets with magnetic field stronger than the 
average field on the orbit (which, according to Eq. (9.112), produce higher effective value of co c and 



20 For that, and many other details, the interested reader may be referred, for example, to the fundamental review 
collection by E. E. Koch et al. (eds.) Handbook on Synchrotron Radiation (in 5 vols.), North-Holland, 1983-1991, 
or a more concise monograph by A. Hofmann, The Physics of Synchrotron Radiation, Cambridge U. Press, 2007. 

21 By modern standards, this energy is not too high. The distinguished feature of NSLS-II is its unprecedented 
electron beam intensity (planned average beam current up to 500 mA) which should allow an extremely high 
synchrotron "brightness" I(a>). 
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hence of ow), and wigglers and undulators: strings of several strong magnets with alternating field 
direction (Fig. 10), that induce periodic bending of electron trajectory, with radiation emitted at each 
bend. 
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Fig. 10.9. Design brightness of various synchrotron radiation sources of the NSLS-II facility. For bend 
magnets and wigglers, the "brightness" may be obtained by multiplication of the spectral density I(a>) 
from one electron pulse, calculated above, by the number of electrons passing the source per second. 
(Note the non-SI units, commonly used in the synchrotron radiation community.) However, for 
undulators, there is an additional factor due to the partial coherence of radiation - see below. (Data from 
document NSLS-II Source Properties and Floor Layout, available online at http://www.nsls.bnl.gov/ .) 
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Fig. 10.10. The generic magnetic structure 
common for wigglers, undulators and free- 
electron lasers. (Adapted from http://www- 
xfeLspring8.or.jp/cband/e/Undulator.htm .) 
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The difference between wigglers and undulators is more quantitative than qualitative: the former 
devices have a larger spatial period k (distance between the adjacent magnets of the same polarity, see 
Fig. 10), giving enough space for the electron beam to bend by an angle larger than y~ , i.e. larger than 
the radiation cone angle. As a result, the pulses radiated at each period arrive to an in-plane observer as 
a series of individual pulses (Fig. 11a). The shape of each pulse, and hence its frequency spectrum, are 
similar to those discussed above, 22 but with much higher local values of co c and Ob ax - see Fig. 9. 
Another difference is a much higher frequency of the peaks. Indeed, the fundamental Eq. (18) allows us 
to calculate the time distance between them, for the observer, as 



dt , . i n \k Ik k 
At* — At' * (l - p)— * — -— « - . 
df u 2y c c 



(10.73) 



where the first two relations are valid at k « R (the relation typically satisfied very well, see Fig. 9), 
and the last two relations also require the ultrarelativistic limit. As a result, the radiation intensity, that is 
proportional to the number of poles, is much higher than that from the bend magnets - in the NLSL-II 
case, more than by 2 orders of magnitude, clearly visible in Fig. 9. 



At*k/2yc 
< > 




(b) 



At 
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Fig. 10.1 1. Radiation (with in-plane polarization) from (a) a wiggler and (b) an undulator - schematically. 



The situation in different in undulators - similar structures with smaller spatial period k, in 
which electron's velocity vector oscillates with angular amplitude smaller that y x . As a result, the 
radiation pulses overlap (Fig. 1 lb) and the radiation waveform is closer to sinusoidal one. As a result, 
the radiation spectrum narrows to the central frequency 23 



co 0 = 



2n _ -,2nc 

— »2y . 

At k 



(10.74) 



For example, for the LSNL-II undulators with k = 20 mm, this formula predicts the radiation peak at 
phonon energy Ha>o ~ 4 keV, in a reasonable agreement with results of quantitative calculations, shown 



22 A small problem for the reader: use Eqs. (20) and (63) to explain the difference between shapes of pulses 
generated at opposite magnetic poles of the wiggler, that is schematically shown in Fig. 1 1 a. 

23 This important formula may be also interpreted in the following way. Due to the relativistic length contraction 
(9.20), the undulators period as perceived by beam electrons is k' = k/y so that the central frequency of radiation 
is a»o' = 2ml k' = Irccylk. For the lab-frame observer, this frequency is Doppler-upshifted according to Eq. (9.44): 
coo = <Mb'[(l + - fJ)\ m « 2ya>o', giving the same result as Eq. (74). 
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in Fig. 9. 24 Due to the spectrum narrowing, the intensity of undulators radiation is higher that that of 
wigglers using the same electron beam. 

This spectrum-narrowing trend is brought to its logical conclusion in the so-called free-electron 
lasers 25 whose basic structure is the same as that of wigglers and undulators (Fig. 10), but the radiation 
at each beam bend is so intense and narrow-focused that it affects the electron motion downstream the 
radiation cone. As a result, the radiation of all bends becomes synchronized, so its spectrum is a narrow 
line at frequency (70), with electromagnetic wave amplitude proportional to the number N of electrons 
in the structure, and hence its power proportional to N (rather than to Nas in wigglers and undulators). 

Finally, note that wigglers, undulators, and free-electron lasers may be also used at the end of a 
linear electron accelerator (such as SLAC) that, as was noted above, may provide extremely high values 
of y, and hence radiation frequencies (70), due to the absence of the radiation energy losses at the 
electron acceleration stage. 



10.4. Bremsstrahlung and Coulomb losses 

Surprisingly, a very similar mechanism of radiation by charged particles works at much lower 
spatial scale, namely at their scattering by charged particles of the propagation medium, the so-called 
bremsstrahlung - German for "brake radiation". This effect responsible, in particular, for the continuous 
part of the frequency spectrum of the radiation produced by standard vacuum X-ray tubes, its incidence 
on a solid "anticathode". 26 

The bremsstrahlung in condensed matter is generally a rather complicated phenomenon, because 
of simultaneous involvement of many particles, and some quantum electrodynamic effect involvement. 
This is why I will give only a very brief glimpse at the theoretical description of this effect, for the 
simplest case when scattering of incoming, relatively light charged particles (such as electrons, protons, 
«-particles, etc.) is produced by atomic nuclei that remain virtually immobile during the scattering event 
(Fig. 12). This is a reasonable approximation if the energy of incoming particles is not too low, 
otherwise most scattering is produced by atomic electrons whose dynamics is substantially quantum - 
see below. 




q,m p 



Fig. 10.12. Basic geometry 
of the bremsstrahlung and 
Coulomb loss problems in 
(a) direct and (b) reciprocal 
space. 



24 Much of the difference is due to the fact that that those plots show the spectral density of the number of photons 
n = Cheaper second, which peaks above the density of power, i.e. energy <£per second. 

25 This name is somewhat misleading, because in contrast to the usual ("quantum") lasers, the free-electron laser 
operation is essentially classical and very similar to that of vacuum-tube microwave generators (such as 
magnetrons briefly discussed in Sec. 9.6) - see, e.g., E. Salin et ah, The Physics of Free Electron Lasers, 
Springer, 2000. 

26 Such X-ray radiation had been observed experimentally (though not correctly interpreted) by N. Tesla in 1887, 
i.e. before the radiation was studied in detail and much publicized by W. Rontgen. 
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To calculate the frequency spectrum of radiation emitted during a single scattering event, it is 
convenient to use a byproduct of the last section's analysis, namely Eq. (59) with replacement (60): 27 



/(«) = 



1 



q 



An c 47T£ n 



°d_ 
dt' 



nx(nx P) 
1-P n 



expjz'<y 



n r 



>dt' 



c J 



(10.75) 



The typical duration r of a single scattering event, that is described by this formula, is of the 
order of ao/c ~ (10~ 10 m)/(3xl0 m/s) ~ 10" 18 s in solids, and only an order of magnitude longer in gases 
at ambient conditions. This is why for most frequencies of interest, from zero all the way up to at least 
soft X-rays, 28 we can use the so-called low-frequency approximation, taking the exponent in Eq. (75) for 
1 through the whole collision event, i.e. the integration interval. This approximation immediately yields 



/(«)- 



1 q> 


nx(nxp fm ) 


47T 2 C 47T£ 0 


1-P fi „ n 



n x 



(nxp im ) 



In the nonrelativistic limit (/?;„;, /?f in « 1), this formula in reduced to 29 

/(*)=— q l \^ e. 

4xc \n e 0 m c 



n 



(10.76) 



(10.77) 



where f is the momentum transferred from the scattering center to the scattered charge (Fig. 12): 30 

f-P, P„i =mAu = mcAp = mc(p fm -P im ), (10.78) 

and 6 is the angle between vector ^ and the direction n toward the observer. 

The most important feature of result (77) is the frequency-independent ("white") spectrum of the 
radiation, very typical for any rapid leaps, which may be approximated as theta-functions of time. (Note, 
however, that this is only valid for a fixed value of f, so that the statistics of this parameter, to be 

discussed in a minute, "colors" the radiation.) Note also the angular distribution of the radiation, 
forming the usual "doughnut" shape about the momentum transfer vector f. In particular, this means that 
in typical cases when f~p, the bremsstrahlung produces a significant radiation flow in the direction 
back to the particle source - the fact significant for the operation of X-ray tubes. 

Now integrating over all wave propagation angles, just as we did for the instant radiation power 
in Sec. 8.2, we get the spectral density of the full energy loss, 



27 In publications on this topic (whose development peak was in the 1920s and 1930s), Gaussian units are more 
common, and letter Z is usually reserved for expressing charges as multiples the fundamental charge e, rather 
than for the wave impedance. This is why, in order to avoid confusion, in this section I will use I/sqc = Z 0 for the 
free-space wave impedance and, still sticking to the same SI units as used through my lecture notes, will write the 
coefficients in a form that makes the transfer to the Gaussian units trivial: it is sufficient to replace all (qq V4;r£b)si 
with (qq ')Gaussian- In the (rare) cases when I spell out the charge values, I will use a different font: q = fe,q' = f'e. 

28 A more careful analysis shows that this approximation is actually quite reasonable up to much higher 
frequencies of the order of •fir. 

29 Evidently, this result (but not the general Eq. (76)!) may be derived from Eq. (8.27) as well. 

30 Please note the font-marked difference between this variable (f) and particle's electric charge (q). 
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= $/(«)«« = — — ^ 

a<y • 3;rc 4;r £■„ m c 



(10.79) 



4w 



The main new feature of bremsstrahlung (as of most scattering problems 31 ), is the necessity to 
take into account the randomness of the impact parameter b (Fig. 12). For elastic (/?;„; = /?f m = /J) 
Coulomb collisions we can use the so-called Rutherford formula for the differential cross-section of 
scattering 32 



da 
'dO' 



qq_ 



1 



1 



2pcf3 J sin 



'(#72)' 



(10.80) 



Here d<j= 2nbdb is the elementary area of the sample cross-section (as visible from the direction of 
incident particles) corresponding to particle scattering into an elementary body angle 33 



dQ.' = 27rsmG'\dG'\. 
Differentiating the geometric relation that is evident from Fig. 12, 

f = 2psm — , 



(10.81) 



(10.82) 



we may present Eq. (80) may be presented in a more convenient form 
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Now combining Eqs. (79) and (83), we get 

d£ da 16 q 2 



dco df 3 Aks 0 



qq 



4ns ^mc 
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c/3 2 



(10.83) 



(10.84) 



This product is called the differential radiation cross-section. When averaged it over all values f 
(which is equivalent to averaging over all values of the impact parameter), it gives a convenient measure 
of radiation intensity. Indeed, after the multiplication by the volume density n of independent scattering 
centers, the integral gives particle's energy loss by unit bandwidth of radiation by unit path length - 
d 2 £ldoodx. A technical problem here is that the integral of \lf formally diverges at both infinite and zero 
values of f. However, these divergences are very weak (logarithmic), and the integral converges due to 
virtually any reason unaccounted for by our simple analysis. The standard simple way to account for 
these effects is to write 



d 2 3 
dcodx 



16 q 2 
— n— - — 
3 Ans n 



qq 



yAns^mc' j 



cp 2 



■ln- 



(10.85) 



31 See, e.g., CM Sec. 3.7. 

32 See, e.g., CM Eq. (3.72) with constant a = qq'IAnE^. In the form used in Eq. (80), the Rutherford formula is 
also valid for small-angle scattering of relativistic particles, the criterion being | 1 « 21 y. 

33 Angle 0' and differential dQ. ', describing the direction of scattered particles, should not be confused with 0 and 
dQ. describing directions of the radiation emitted at the scattering event. 
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and then plug, instead of fm ax and fmm . scales of the most important effects limiting the small momentum 
range. At classical analysis, according to Eq. (82), ^m ax = 2p. To estimate ^h, let us note that very small 
momentum transfer takes place when the impact parameter b is very large and hence the effective 
scattering time r ~ blv is very long. Recalling the condition of the low-frequency approximation, we 
may associate with r ~ \lm and hence with b ~ ut ~ via). Since for the small scattering angles, f 

2 2 

may be estimated as the impulse Ft - {qq'IAnsob )r of the Coulomb force, so that ~ (qq'IAits^a/u , 
and Eq. (85) becomes 
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qq 
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/ 4as 0 2mu 3 
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(10.86) 



This is Bohr's formula for what is called the classical bremsstrahlung. We see that the low 
momentum cutoff indeed makes the spectrum colored, with more energy going to lower frequencies. 
There is even a formal divergence at a> — > 0; however, this divergence is integrable, so it does not 
present a problem in finding the total energy radiative losses (-d£/dx) as an integral of Eq. (86) over all 

radiated frequencies a>. A larger problem for this procedure is the upper integration limit, a> — > co, at 
which the integral diverges. This means that our approximate description, which considers the collision 
as an elastic process, becomes wrong, and needs to be amended by taking into account the difference 
between the initial and final kinetic energies of the particle due to radiation of the energy quantum hco 
of the emitted photon: 



2 



2m 2m 



= tia). 



(10.87) 



As a result, taking into account that the minimum and maximum values of f correspond to, respectively, 
the parallel and antiparallel alignments of vectors p; n i and pf m , we get 



(10.88) 



Plugged into Eq. (85), this expression yields the so-called Bethe-Heitler formula for quantum 
bremsstrahlung? 4 Note that at this approach, ^m ax is close to that of the classical approximation, but 

~ ha>/u, so that 




f'm'm | classical CCZZ 



Tmin quantum 



P 



(10.89) 



where z and z' are particles' charges in units of e, and a is the fine structure constant: 



a 



I 



A + SI j_ I Gaussian 

Ansjic he 



137 



«1. 



(10.90) 



34 The modifications of this formula necessary for the relativistic case description are surprisingly minor - see, 
e.g., Chapter 15 of J. D. Jackson, Classical Electrodynamics, 3 rd ed., Wiley 1999. For more detail, the standard 
reference monograph on bremsstrahlung is W. Heitler, The Quantum Theory of Radiation, 3 rd ed., Oxford U. Press 
1954 (reprinted in 2010 by Dover). 
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For most cases of practical interest, ratio (89) is smaller that 1, and since we have to keep the highest 
value of ^min, the Bethe-Heitler formula should be used. 

Now nothing prevents us from calculating the total radiative losses of energy per unit length: 
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where hcOm^ = is the maximum energy of the radiation quantum. By introducing the dimensionless 



integration variable £ = ficol<?= 2ficol(mu 12) this integral is reduced to the table one, 35 and we get 
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(10.92) 



In my usual style, I would give you an estimate of the losses for a typical case; however, let me 
compare them to a parallel energy loss mechanism, the so-called Coulomb losses, due to the transfer of 
mechanical impulse from the scattered particle to the scattering center. (This energy eventually goes into 
an increase of the thermal energy of the scattering medium.) Using Eqs. (9.139) for the electric field of a 
linearly moving charge, we can readily find the momentum it transfers to charge q ': 36 
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(10.93) 



Hence, the kinetic energy acquired by the scattering center (equal to the loss of energy of the incident 
particle) is 
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(10.94) 



Such energy losses have to be summed up over all collisions, with random values of the impact 
parameter b. At the scattering center density n, the number of collisions per small path length dz per 
small range db is dN = nlxbdbdx, so that 
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where B = 



(10.95) 



Here the logarithmic integral over b was treated similarly to that over f in the bremsstrahlung 
theory. This approach is adequate, because the ratio b max /b m m is much larger than 1. Indeed, b m i n may be 
estimated from (Ap ') m;a ~p = ymu. For this value, Eq. (93) with q ' ~ q gives b m i n ~ r c (see Eq. (8.41) and 
its discussion), which is, for elementary particles, of the order of 10" 15 m. On the other hand, for the most 
important case when charges q ' belong to electrons (which, according to Eq. (94) are the most efficient 
Coulomb energy absorbers, due to their extremely low mass m r ), 6 max may be estimated from condition 
r = blyu ~ l/«toax, where «b ax ~ 10 16 s" 1 is the characteristic frequency of electron transitions in atoms. 



35 See, e.g.,MAEq. (6.13). 

36 According to Eq. (9. 139), E z =0, and the net impulse of the longitudinal force q 'E x is zero. 
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(Below this frequency, our classical analysis of scatterer's motion is invalid.) From here, we have the 
estimate b max ~ yul C0m ax , so that 

B=^~^-, (10.96) 

Km W 

o o g 

for y ~ 1 and w ~ c « 3x10 m/s giving 6 max ~ 3x10" m, and 5-10 (give or take a couple orders of 
magnitude - this does not change the estimate InZ? ~ 20 too much). 37 

Now we can compare the Coulomb losses (95) with those due to the bremsstrahlung, given by 
Eq.(92): 

— d3\ m ' 1 

P^~«^' — p 2 — , (10.97) 

Coulomb m lnB 

Since a~ 10" 2 « 1, for nonrelativistic particles (fi« 1) the Coulomb losses of energy are much higher, 
and only for ultrarelativistic particles, the relation may be opposite. 

According to Eq. (95), for electron-electron scattering (q = q' = -e, m' = m e ), 3& at the value n 
6x10 m" typical for air at ambient conditions, the characteristic length of energy loss, 

'•-pf/^r <ia98) 

for electrons with kinetic energy 6 keV is close to 2xl0~ 4 m = 0.2 mm. (This is why you need 
vacuum in CRT monitors and electron microscope columns!) Since l c cc more energetic particles 
penetrate deeper, until the bremsstrahlung steps in at very high energies. 



10.5. Density effects and the Cherenkov radiation 

For condensed matter, the Coulomb loss estimate made in the last section is not quite suitable, 
because it is based on the upper cutoff b max ~ yu/co max . For the example given above, incoming electron 
velocity u is close to 5xl0 7 m/s, and for the typical value ~ 10 16 s-1 (/z«b ax ~ 10 eV), this cutoff 
bmax ~ 5xl0" 9 m = 5 nm. Even for air at ambient conditions, this is larger than the average distance (~ 2 
nm) between the molecules, so that at the high end of the impact parameter range, at b ~ b m&x , the 
Coulomb loss events in adjacent molecules are not quite independent, and the theory needs corrections. 
For condensed matter, with much higher particle density n, most collisions satisfy condition 

nb 3 »l, (10.99) 



37 A quantum analysis (carried out by H. Bethe in 1940) replaces, in Eq. (95), InS with ln(2 y 2 mu 2 /h(cv)) - ft 2 , 
where (o)) is the average frequency of the atomic quantum transitions weight by their oscillator strength. This 
refinement does not change the estimate given below. Note that both the classical and quantum formulas describe, 
a fast increase (as lip) of the energy loss rate {-d£ldx) at y — > 1 and its slow increase (as Iny) at y— > oo, so that the 
losses have a minimum at (y- I) ~ 1. 

38 Actually, the above analysis has neglected the change of momentum of the incident particle. This is legitimate 
at m ' « m, but for m = m ' the change approximately doubles the energy losses. Still, this does not change the 
order of magnitude of the estimate. 
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and the treatment of Coulomb collisions as independent events is completely inadequate. However, 
condition (99) enables the opposite approach: treating the medium as a continuum. In the time domain 
formulation, used in the previous sections of this chapter, this would be a very complex problem, 
because it would require an explicit description of medium dynamics. Here the frequency-domain 
approach, based on the Fourier transform in both time and space, helps a lot, provided that functions 
e(cd) and ju(co) are considered known - either calculated or taken from experiment. Let us have a good 
look at such approach, because it gives some interesting (and practically important) results. 

In Chapter 6, we have used the macroscopic Maxwell equations to derive Eqs. (6.109), which 
describe the time evolution of potentials in a medium with frequency-independent s and ju. Looking for 
all functions participating in Eqs. (6.109) in the form of plane-wave expansion 39 

f(r,t) = \d 3 k\dcof K( / {kr - (0t \ (10.100) 
and requiring all coefficients at similar exponents to be balanced, we get their Fourier image: 40 
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As was discussed in Chapter 7, in such a Fourier form, the Maxwell theory remain valid even for the 
dispersive media, so that Eq. (101) is generalized as 
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The evident advantage of these equations is that their formal solution is trivial: 
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Field 

nn irm Potentials 
(1U.1UJ) in a linear 



medium 



so that the "only" remaining things to do is to calculate the Fourier transforms of functions p(r, t) and 
j(r, t), describing stand-alone charges and currents, using the transform reciprocal to Eq. (100), with one 
factor \I2k per each scalar dimension, 

1 



fk,ai 



-^d 2 r^dtf(r,t)e 



-i(k-r-a>t) 



(10.104) 



and than carry out the integration (100). 

For our current problem of a single charge q, uniformly moving in the medium with velocity u, 

p(r,t) = qS(r-ut), j(r, t) = #u£(r - ut) , (10.105) 

the first task is easy: 



39 All integrals here and below are in infinite limits, unless specified otherwise. 

40 As was discussed in Sec. 7.2, the Ohmic conductivity of the medium (generally, also a function of frequency) 
may be readily incorporated into the dielectric permittivity: £(co) — > s e ^cd) + ia{cd)la). In this section, I will assume 
that such incorporation, which is especially natural for high frequencies, has been performed, so that the current 
density j(r, t) describes only stand-alone currents - for example, the current (105) of the incident particle. 
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/V. =-^\d>r\dt q S{r-*t)e-^ r -^ = ^Je'<^>* = ^<y (fl ,-k.n). (10.106) 
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Since expressions (105) for p(r, t) and j(r, t) differ only by a constant factor u, it is clear that the 
absolutely similar calculation for current would give 



(20 

Let us summarize what we have got by now, plugging Eqs. (106) and (107) into Eqs. (103): 
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= s(coMco)u0 km . (10.108) 



Now, at the last step of calculations, namely integration (100), we are starting to pay a heavy 
price for the easiness of the first steps. This is why let us think well what exactly do we need from it. 
First of all, for the calculation of power losses, the electric field is more convenient to use than the 
potentials, so let us calculate the Fourier images of E and B. Plugging expansion (100) into the 
fundamental relations (6.106), and again requiring the balance of exponent's coefficients, we get 

E k,» +" aA -k,« = i[cos(co)ii(co)u-k]<f> k m , B M = !kxA k(B =if(o)//(o)kxu^ ffl , (10.109) 

so that Eqs. (100) and (108) yield 
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With the notation used in Eq. (51), this integral may be partitioned as 
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Let us calculate the Cartesian components of the partial Fourier image E w at a point separated 
by distance b from particle's trajectory. Selecting the coordinates and time origin as shown in Fig. 9.1 la, 
we have r = {0, b, 0}, so that only E x and E y are not vanishing. In particular, according to Eq. (1 1 1), 

f ji f ji f j; COS(C0)/u(C0)u-k x 



-| dk x ^ dk y ^ dk 



(2nYs(co) J AJ yj ' k z -co 2 £{co)/j{co) 
The delta-function kills one integral (over k x ) of three, and we get: 



(EX = 



iq 



(In) s(co)u 



cos(co) ju(co)u - 



CO 



i \ ik y b 

d(co-k x u)e 7 



dk. 



co 2 lu 2 +k 2 + k 2 z - co 2 e(co) ju(co) 



(10.112) 



(10.113) 



The last integral (over k y ) may be readily reduced to the table integral \dE)(\ + <f), in infinite limits, 
equal to n. 41 The result may be presented as 



41 See, e.g., MA Eq. (6.5a). 
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2 ikyb 

(EX=- n % qK ( A," 2 y 2 dk„ (10.114) 
(2n) (os{(o) i (k 2 y +K 2 ) 



y 

where parameter k (generally, a complex function of frequency) is defined as 

' 1 ^ 



2 2 
K =CO 



s(co)ju(co)\. (10.115) 

yu J 



The last integral may be expressed via the modified Bessel function of the second kind: 42 



A similar calculation yields 



T*' *.("*)■ (10.116) 
(271) coe(co) 



(E y h= - I* - Kyi*)- (10-117) 
(2n) s(co) 



Now, instead of rushing to make the final integration (111) over frequency to calculate E(t), let 
us realize that what we need for power losses is only the total energy loss through the whole time of 
particle passage. Energy loss per unit volume is 



d^ 
dV 



= Jj-E<&, (10.118) 



■^ = \dt\dcoe- i(at \dcD'e- i(O ' t \ (a -K (0l = 2n\dw\dw'\ a> -E <a ,S(w + af) = 2n\\ (0 -E_ <a dcD. (10.119) 



where j is the current of bound charges in the medium, and should not be confused with the free 
particle's current (105). This integral may be readily expressed via the partial Fourier image E w and the 
similarly defined image \ 0> , just as it was done at the derivation of Eq. (54): 

d£_ 
dV 

In our approach, the Ohmic conductance is incorporated into the complex permittivity s(co), so that, 
according to the discussion in the end of Sec. 7.2, current's Fourier image is 

j a = cr e{ (a>)E a = -i(os((o)E m . (10.120) 

As a result, Eq. (119) yields 

d£ °° 2 

= -2n i f e(a>)E -E m ad(o = 4n1m f e(aiiE J[ coda> . (10.121) 

dV J o 

(The last transition is possible due to the property s{-a>) = £*(co), which was discussed in Sec. 7.2.) 

Finally, just as in the last section, we have to calculate the energy loss rate averaged over random 
values of the impact parameter b: 



42 As a reminder, the main properties of these functions are listed in Sec. 2.5 of these notes - see, in particular, 
Fig. 2.20b and Eqs. (2.157)-(2.158). 
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dx 



■Jl 





CO 

d 2 b = 2n\ 


f d^ 


{ dVj 




l dV) 



bdb = %n 1 ^bdb^\E x \l+\E v \Jlms(co)a>da>. (10.122) 



Note that we are cutting the resulting integral over b from below at some b m \n where our theory looses 
legitimacy. (On that limit, we are not doing much better than in the past section). Plugging in the 
calculated expressions (116) and (117) for field components, swapping the integrals, and using recurrent 
relations (2.142), which are valid for any Bessel functions, we finally get: 



Radiation 
intensity 



dx n 1 



cos{co) 



(10.123) 



This general result is valid for an arbitrary linear medium, with arbitrary dispersion relations 
s(co) and ju(co). (The last function participates in Eq. (123) only via Eq. (115) which defines parameter 
k.) To get more concrete results, some particular model of the medium should be used. Let us explore 
the model of independent harmonic oscillators, which was discussed in Sec. 7.2, in its form (7.33) 
suitable for transition to quantum-mechanical description of atoms: 

/, 



s(co) = s 0 + 



nq 
m 



(co 2 - co 2 )- Had, 



(10.124) 



If the damping of the effective atomic oscillators is low, Sj « coj, and particle's speed u is much lower 
than the typical wave's phase velocity v (and hence c!), then for most frequencies Eq. (1 15) gives 



2 2 
K = CO 



1 



1 



'(CO) 



CO' 



i.e. re = k* » colu is real. In this case, Eq. (123) may be shown to give Eq. (95) with 

1.123m 



b. 



(10.125) 



(10.126) 



Good news here is that both approaches (the microscopic analysis of Sec. 4 and the macroscopic 
analysis of this section) give essentially the same result. This fact may be also perceived as bad news: 
the treatment of the medium as a continuum does not give any new results here. The situation somewhat 
changes at relativistic velocities at which such treatment provides noticeable corrections (called density 
effects), in particular reducing the energy loss estimates. 

Let me, however, skip these details and focus on a much more important effect described by our 
formulas. Consider the dependence of the electric field components on the impact parameter b, i.e. on 
the closest distance between particle's trajectory and the field observation point. If > 0, then k is real, 
and we can use, in Eqs. (1 16)-(1 17), the asymptotic formula (2.158), 



f V /2 



■ oo . 



(10.127) 



to conclude that the complex amplitudes E m of both components E x and E y of the electric field decrease 
exponentially, starting from b ~ ul{co). However, let us consider what happens at frequencies where k < 
0, i.e. 
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e(co)ju(co) = ^— -< — < — = £ oJ u 0 . (10.128) 
v (a>) u c 

(This condition means that particle's velocity is larger than the phase velocity of waves, at this particular 
frequency.) In these intervals, k is purely imaginary, 43 functions exp { Kb) become just phase factors, and 

\E x {a>)\^\E y {a>)\^^. (10.129) 

This means that the Poynting vector drops as lib, so that its flux through a surface of a round cylinder of 
radius b, with the axis on the particle trajectory (i.e. power flow), does not depend on b. Hence, this is 
wave emission - the famous Cherenkov radiation. 44 

The direction of its propagation may be readily found taking into account that at large distances 
from particle's trajectory the emitted wave has to be locally planar, so that the Cherenkov angle 6 may 
be found from the ratio of the field components (Fig. 13a): 




Fig. 10.13. (a) Cherenkov radiation's propagation angle 6, and (b) its interpretation. 



This ratio may be calculated by plugging the asymptotic formula (127) into Eqs. (116) and (117) 
and calculating their ratio: 



tan# = = l — = [e(co)ju(co)u 2 -\]' 2 = 



f 2 
U 



CO 



.1/2 



-1 



(10.131a) 



so that 



43 Strictly speaking, inequality x 2 < 0 does not make sense for a medium with complex e((o)/Lt((o) and hence 
complex /^(co). However, in a typical medium where particles can propagate over substantial distances, the 
imaginary part of product ^(^M^does not vanish only in very limited frequency intervals, much more narrow 
that the intervals which we are now discussing - please have one more look at Fig. 7.5. 

44 This radiation was observed experimentally by P. Cherenkov (in older Western texts, "Cerenkov") in 1934, 
with the observations explained by I. Frank and E. Tamm in 1937. Note, however, that the effect had been 
predicted theoretically as early as in 1889 by the same O. Heaviside whose name was mentioned so many times 
above - and whose genius I believe is still underappreciated. 
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Cherenkov 
angle 




Cherenkov 
radiation 
intensity 



(10.131b) 



Remarkably, this direction does not depend on the emission time t', so that radiation of 
frequency a>, at each instant, forms a hollow cone led by the particle. This simple result allows an 
evident interpretation (Fig. 13b): the cone is just the set of all observation points that may be reached by 
"signals" propagating with speed v(co) < u from all previous points of particle's trajectory. 

This phenomenon is closely related to the so-called Mach cone in fluid dynamics, 45 besides that 
in the Cherenkov radiation there is a separate cone for each frequency (of the range in which v(co) < u): 

1/2 

the smaller is the s{co)fj{(o) product, i.e. the larger is wave velocity v(&>) = l/[£( &»)//(<»)] , and the 
broader is the cone, i.e. the earlier the corresponding "shock wave" arrives to an observer. Please note 
that the Cherenkov radiation is a unique radiative phenomenon: it takes place even if a particle moves 
without acceleration, and (in agreement with our analysis in Sec. 2, is impossible in free space where v 
= c is always larger than u. 

The intensity of the Cherenkov radiation intensity may be also readily found by plugging the 
asymptotic expression (127), with imaginary k, into Eq. (123). The result is 



(10.132) 



For nonrelativistic particles (u « c), the Cherenkov radiation condition u > v(a>) may be fulfilled only 
in relatively narrow frequency intervals where the product s(co)ju(co) is very large (usually, due to optical 
resonance peaks of the electric permittivity - see Fig. 7.5 and its discussion). In this case the emitted 
light consists of a few nearly monochromatic components. On the contrary, if the condition u > v(co), i.e. 
u le{o))iu{cd) > 1 is fulfilled in a broad frequency range (as it is for ultrarelativistic particles in condensed 
media), the radiated power is clearly dominated by higher frequency of the range - hence the famous 
bluish color of the Cherenkov radiation glow in water nuclear reactors- see Fig. 14. 




Fig. 10.14. The Cherenkov radiation glow coming from the 
Advanced Test Reactor of the Idaho National Laboratory. 
Adapted from http :// en.wikipedia.org/wiki/ Cherenkov radiation . 



See, e.g., a brief discussion in CM Sec. 8.6. 
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The Cherenkov radiation is broadly used for the detection of radiation in high energy 
experiments for particle identification and speed measurement (since it is easy to pass particles through 
media of various density and hence of the dielectric constant) - for example, in the so-called Ring 
Imaging Cherenkov (RICH) detectors that have been designed for the DELPHI experiment 46 at the 
Large Electron-Positron Collider (LEP) in CERN. 

A little bit counter-intuitively, the formalism described in this section is also very useful for the 
description of an apparently rather different effect - the so-called transition radiation that takes place 
when a charged particle crosses a border between two media. 47 The effect may be understood as result 
of the time dependence of the electric dipole formed by the moving charge and its mirror image in the 
counterpart medium - see Fig. 15. In the nonrelativistic limit, the effect allows a straightforward 
description combining the electrostatics picture of Sec. 3.4 (see Fig. 3.9 and its discussion), and Eq. 
(8.27) - slightly corrected for polarization effects of the media. However, if particle's velocity u is 
comparable with the phase velocity of waves in either medium, the adequate theory of the transition 
radiation becomes very close to that of the Cherenkov radiation. 
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Fig. 10.15. Physics of the transition 
radiation. 



In comparison with the Cherenkov radiation, the transition radiation is rather weak, and its 
practical use (mostly for the measurement of the relativistic factor y, to which the radiation intensity is 
proportional) requires multi-layered stacks. 48 In these systems, the radiation emitted at sequential 
borders may be coherent, and the system's physics becomes close to that of the undulators discussed in 
Sec. 4. 



10.6. Radiation's back-action 

An attentive reader could notice that so far our treatment of charged particle dynamics has never 
been fully self-consistent. Indeed, in Sec. 9.6 we have analyzed particle's motion in various external 
fields, ignoring the fields radiated by particle itself, while in Sec. 8.2 and earlier in this chapter these 
fields have been calculated (admittedly, just for a few simple cases), but, again, their back-action on the 
emitting particle have been ignored. Only in few cases we have taken the back effects of the radiation 



46 See, e.g., http :// delphiww w.cern. ch/offline/physics/ delphi-detector.html . For a broader view at radiation 
detectors (including Cherenkov ones), the reader may be referred to the classical text by G. F. Knoll, Radiation 
Detection and Measurement, 4 th ed., Wiley, 2010, and a newer treatment by K. Klemknecht, Detectors for 
Particle Radiation, Cambridge U. Press, 1999. 

47 The effect was predicted theoretically in 1946 by V. Ginzburg and I. Frank, and only later observed 
experimentally. 

48 See, e.g., Sec. 5.3 in K. Klemknecht's monograph cited above. 
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implicitly, via the energy conservation. However, even in these cases, the near-field components of the 
fields (such as the first term in Eq. (20a), that affect the moving particle most, have been ignored. 

At the same time, it is clear that generally the interaction of a point charge with its own field 
cannot be always ignored. As the simplest example, if an electron is made to fly through a resonant 
cavity, thus inducing oscillations in it, and then is forced to return to it before the oscillations have 
decayed, its motion will be certainly affected by the oscillating fields, just as if they had been induced 
by another source. There is no conceptual problem with applying the Maxwell theory to such "field- 
particle rendezvous" effects; moreover, it is the basis of the engineering design of such electron devices 
as klystrons, magnetrons, and undulators. 

A problem arises only when no finite "rendezvous" point is enforced by boundary conditions, so 
that the most important self-field effects are at R = |r - r'|— » 0, the most evident example being the 
radiation of particle in free space, described earlier in this chapter. We already know that radiation takes 
away a part of charge's kinetic energy, i.e. has to cause its deceleration. One should wonder, however, 
whether such self-action effects might be described in a more direct, non-perturbative way. 

As the first attempt, let us try a phenomenological approach based on the already derived 
formulas for radiation power "P. For the sake of simplicity, let us consider a nonrelativistic point charge 
q in free space, so that "Pis described by Eq. (8.27), with electric dipole moment's derivative over time 
equal to qw. 

<P = ^-u 2 =J-J^u\ (10.133) 
6xc 3c AkSq 

The most naive approach would be to write the equation of particle's motion in the form 

mu = F ext +F self , (10.134) 

and try to calculate the radiation back-action force by requiring its instant power, -F se if u, to be equal to 
T. However, with Eq. (133), this approach (say, for ID motion) would give a very unnatural result, 

F seif ^—, (10.135) 
u 

that might diverge at some points of particle's trajectory. This failure is clearly due to the retardation 
effect: as the reader may recall, Eq. (133) results from the analysis of radiation fields at large distances 
from the particle, e.g., from the second term in Eq. (20a), i.e. when the non-radiative first term (which is 
much larger at small distances, R — > 0) is ignored. 

Before exploring the effects of this term, let us, however, make one more try with Eq. (133), 
considering its average effect on some periodic motion of the particle. To calculate the average, let us 
write 

i t 

u =-ju-udt, (10.136) 

T 0 

and integrate this identity, over the motion period, by parts: 
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3c 3 4xs 0 3c 3 47T£ 0 T 
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T 1 1 7 2 2 

fiiu<# = — f — - — - — ii- 

J T { 3c 3 4^ 0 



udt. (10.137) 



One the other hand, the back-action force would give 



1 r 



(10.138) 



These two averages coincide if 49 



F self - 



3c 3 4ns n 



-u . 



Abraham- 

(10.139) Lorentz 

v ' force 



This is the so-called Abraham-Lorentz force for self-action. Before going after a more serious 
derivation of this formula, let us estimate its scale, presenting Eq. (139) as 



F lf = mm, with z = 



q 



3mc Ans n 



(10.140) 



where constant r evidently has the dimension of time. Recalling definition (8.41) of the classical radius 
r c of the particle, Eq. (140) for r may be rewritten as 

2r 



3 c 



(10.141) 



23 

For the electron, r is of the order of 10" s. This means that in most cases the Abrahams-Lorentz force 
is either negligible or leads to the same results as the perturbative treatments of energy loss we have 
used earlier in this chapter. 

However, Eq. (140) brings some unpleasant surprises. For example, let us consider a ID 
oscillator of eigenfrequency coq. For it, Eq. (134), with the back-action force given by Eq. (140), is 



mx + ma) Q x = mr x 



(10.142) 



Looking for the solution to this linear differential equation in the usual exponential form, x(t) <x 
exp {/?/}, we get the following characteristic equation, 



A 2 + col = r/t 3 



(10.143) 



23 1 

10 s" , the right-hand side of this 



It may look like that for any "reasonable" value of a>o « \lz 
nonlinear algebraic equation may be treated as a perturbation. Indeed, looking for its solutions in the 
natural form A± = ±icoo + A', with | X' \ « coo, expanding both parts of Eq. (143) in the Taylor series in 
small parameter X', and keeping only linear terms, we get 



49 This formula may be readily generalized to the relativistic case: 



pa 

-'self 



3mc 4xs n 
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the so-called Abraham-Lorentz-Dirac force. 
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(10.144) 

This means that the energy of free oscillations decreases in time as exp{21Y} = exp{-<«b z t}\ this is 
exactly the radiative damping analyzed earlier. However, Eq. (143) is deceiving; it has the third root 
corresponding to unphysical, exponentially growing (so-called run-away) solutions. It is easiest to see 
for a free particle, with a>o = 0. Then Eq. (143) becomes very simple, 

2 2 =r/t 3 , (10.145) 

and it is easy to find all its 3 roots explicitly: X\ = A2 = 0 and A3 = 1/ z. While the first 2 roots correspond 
the values A± found earlier, the last one describes exponential (and extremely fast!) acceleration.. 

In order to remove this artifact, let us try to develop a self-consistent approach to back action, 
taking into account the near-field terms of particle fields. For that, we need somehow overcome the 
divergence of Eqs. (10) and (20) at R — » 0. The most reasonable way to do this is to spread particle 
charge over a ball of radius a, with a spherically-symmetric (but not necessarily constant) density p(r), 
and in the end of calculations trace the limit a — > 0. 50 Again sticking to the non-relativistic case (so that 
the magnetic component of the Lorentz force is not important), we should calculate 

F rad =j>(r)E(r,0</V, (10.146) 

V 

where the electric field is that of the charge itself, with field of any elementary charge dq = p(r)d r, 
described by Eqs. (20a). 

In order to make analytical calculations doable, we need to make assumption a « r c , treat ratio 
Rlr c ~ alr c as a small parameter, and expand the result in the Taylor series in small R. This procedure 
yields 

F self =--— f ^'J ^ [d'rtd'r'p^R-'pir'). (10.147) 
Distance R cancels only in the term with n = 1, 

F, =^j-^-\d 3 r\d 3 r'p(r)p(r') = -^u, (10.148) 

3c 47T£ 0 ' ' O7T£ 0 C 

showing that we have recovered (now in an apparently legitimate fashion) Eq. (139) for the Abrahams- 
Lorentz force. One could argue that in the limit a — » 0 the terms higher in R ~ a (with n > 1) could be 
ignored. However, we have to notice that the main contribution to into series (147) is not described by 
Eq. (148) for n = 1, but is given by the larger term with n = 0: 



F 0 = -— — ^ fafVfflfV P(r)p(r,) = -4-L[rfV[rf3 r , ^ = * (10.149) 

3 4tt£ 0 c 2 11 R 3 c 2 Stts 0 { { R 3c 2 



50 Note: this operation cannot be interpreted as describing a quantum spread due to the finite extent of point 
particle's wavefunction. In quantum mechanics, parts of wavefunction of the same charged particle do not interact 
with each other! 
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This term may be interpreted as the inertial "force" -m e fSi 51 with the effective electromagnetic mass 

Effective 
electro- 
nic). 150) magnetic 
mass 

This is the famous (or rather infamous :-) 4/3 problem that does not allow to interpret the 
electron's mass as that of its electric field. The (admittedly, rather formal) resolution of this paradox is 
possible only in quantum electrodynamics with its renormalization techniques - beyond the framework 
of this course. Note that these issues are only important for motions with frequencies of the order of XI x 
~ 10 23 s" 1 , i.e. at energies 3 ~ h/r~ 10" 11 J ~ 10 8 eV, while other quantum electrodynamics effects may 

be observed at much lower frequencies, starting from ~10 10 s" . Hence the 4/3 problem is by no means 
the only motivation for the transfer from classical to quantum electrodynamics. 

However, the reader should not think that his or her time spent on this course has been lost: 
quantum electrodynamics incorporates virtually all classical electrodynamics results, and transition 
between them is surprisingly straightforward. 52 

10.6. Exercise problems 

10.1 . Find the time dependence of the kinetic energy of a charged relativistic particle performing 
synchrotron motion in a constant and uniform magnetic field B, and hence emitting synchrotron 
radiation. Sketch particle's trajectory. 

Hint: You may assume that the energy loss is relatively slow {-d£ldt « a> c £), but should spell 
out the condition of validity of this assumption. 

10.2 . Calculate the power spectrum of the intensity of radiation emitted by a relativistic particle, 
performing harmonic ID oscillations, in a certain direction. 

10.3 . Find the polarization of the synchrotron radiation propagating within the particle rotation 

plane. 

10.4 . An electron, launched directly toward a plane surface of a perfect conductor is instantly 
absorbed by it at the collision. Find the angular distribution and frequency spectrum of electromagnetic 
waves radiated at this collision, if the initial kinetic energy T of the particle is much larger than 
conductor's workfunction <ft. Give a semi-quantitative discussion of the limitations of your result. 



51 See, e.g., CM Sec. 6.6. 

52 See, e.g., QM Chapter 9 and references therein. 
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Chapter 1. Introduction 

This introductory chapter briefly reviews the major motivations for quantum mechanics. Then its 
simplest formalism - Schrddinger 's wave mechanics - is described, and its main features are discussed 
Much of this material (perhaps except for the last section) may be found in undergraduate textbooks. 1 



1.1. Experimental motivations 

By the beginning of the 1900s, physics (which by that time included what we now know as 
nonrelativistic classical mechanics, classical statistics and thermodynamics, and classical 
electrodynamics including geometric and wave optics) looked as an almost completed discipline, with a 
lot of experimental observations explained, and just a couple of mysterious "dark clouds" 2 on the 
horizon. However, the rapid technological progress and the resulting fast development of experimental 
techniques have led to a fast multiplication of observed phenomena that could not be explained on the 
classical basis. Let me list the most consequential of those experimental findings. 

(i) Blackbody radiation measurements, started by G. Kirchhoff in 1859, have shown that the in 
the thermal equilibrium, the power of electromagnetic radiation by a fully absorbing ("black") surface 
per unit frequency interval drops exponentially at high frequencies. This is not what could be expected 
from the combination of the classical electrodynamics and statistics, which predicted an infinite growth 
of the radiation density with frequency. Indeed, classical electrodynamics shows 3 that electromagnetic 
field modes in free space evolve in time as harmonic oscillators, and that the density of these modes in a 
large volume V» k 3 per small frequency interval is 

_ T . dV k - Tr 4nk 2 dk T . co 2 , .... 

where c » 3><10 m/s is the free-space speed of light, co its frequency, k = cole the free-space wave 
number, and k = 2nlk is the radiation wavelength. On the other hand, classical statistics 4 predicts that in 
the thermal equilibrium at temperature T, the average energy E of each ID harmonic oscillator should 
equal koT, where £b is the Boltzmann constant. 5 



1 For remedial reading, I can recommend the following textbooks (in the alphabetical order): S. Gasiorowicz, 
Quantum Physics, 3 rd ed., Wiley, 2003; D. Griffith, Quantum Mechanics, 2 nd ed., Pearson Prentice Hall, 2005; R. 
Liboff, Introductory Quantum Mechanics, 3 rd ed., Addison- Wesley, 1998; and also lecture notes by B. Simons, 
Advanced Quantum Mechanics II, available online at www.tcm.phy.cam.ac.uk/~bdslO/aqp.html . 

2 This expression was used in a 1900 talk by Lord Kelvin (born W. Thomson) in reference to the blackbody 
radiation measurements and Michelson-Morley experiment results, i.e. the precursors of the quantum mechanics 
and relativity theory. 

3 See, e.g., EM Sec. 7.9. The degeneracy factor 2 in Eq. (1) is due to two possible polarizations of transversal 
electromagnetic waves. For waves of other physical nature, which obey with the linear ("acoustic") dispersion 
law, similar relations are also valid, though possibly with a different degeneracy factor - see, e.g., CM Sec. 7.7. 

4 See, e.g., SM Sec. 2.2. 

5 In the SI units, used through these notes, k B ~ 1.38x1 0" 23 J/K. Note that in many theoretical papers (and in the 
SM part of my notes), k B is taken for 1, i.e. temperature is measured in energy units. 
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Combining these two results, we readily get the so-called Rayleigh- Jeans formula for the 
average electromagnetic wave energy per unit volume: 



1 dE k R T dN co 2 



u = 



V dco V dco 



~2 3 

n c 



k B T , 



(1.2) 



that diverges at co — > qo. On the other hand, the blackbody radiation measurements, improved by O. 
Lummer and E. Pringsheim, and also H. Rubens and F. Kurlbaum to reach a 1%-scale accuracy, were 
compatible with the phenomenological law suggested in 1900 by Max Planck: 



hco 



7V 2 c 3 Qxp(hco/ k B T)-\ 



(1.3a) 



Planck 

radiation 

law 



The law may be reconciled with the fundamental Eq. (1) if the following replacement is made for the 
average energy of each field oscillator: 



hco 



exp(hco/ k B T) — l 



with a constant factor 



/z * 1.055 xlO" 34 J-s 



(1.3b) 



(1.4) 



Planck's 
constant 



now called Planck's constant. 6 At low frequencies (hco « keT), the denominator in Eq. (3) may be 
approximated as hco/k B T, so that the average energy (3b) tends to its classical value k B T, and the Planck 
law (3a) reduces to the Rayleigh- Jeans formula (2). However, at higher frequencies (hco» k B T), Eq. (3) 
describes the experimentally observed rapid decrease of the radiation density - see Fig. 1 . 




Fig. 1.1. Blackbody radiation density u, expressed 
in units of w 0 = (k B T) 2 /7^fi 2 c 2 , as a function of 
frequency, according to: the Rayleigh- Jeans 
formula (blue line) and the Planck law (red line). 



hcolk B T 



(ii) The photoelectric effect , experimentally discovered in 1887 by H. Hertz, shows a sharp 
lower bound on the frequency of light that may kick electrons out from metallic surfaces, regardless of 



6 M. Planck himself wrote hco as hv, where v= cdln is the "cyclic" frequency, measured in Hz (periods per 
second), so that in early texts the term "Planck's constant" referred to h = Inh , while h was called "the Dirac 
constant" for a while. 
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the light intensity. Albert Einstein, in the first of his three famous 1905 papers, noticed that this 
threshold CQmm could be readily explained assuming that light consisted of certain particles (now called 
photons) with energy 



Energy 
vs 

frequency 



E = hco = h v , 



(1.5) 



with the same Planck's constant that participates in Eq. (3). 7 Indeed, with this assumption, at the photon 
absorption by the surface, its energy E = hco is divided between a fixed energy W (now called the 
workfunction) of electron binding inside the metal, and the residual kinetic energy mv 12 > 0 of the freed 
electron - see Fig. 2. In this picture, the frequency threshold finds a natural explanation as €0^= Wlh. % 
Moreover, as was shown by S. Bose in 1924, Eq. (5) readily explains 9 Planck's law (3). 




Fig. 1.2. Einstein's explanation of the photoelectric 
effect's frequency threshold. 



(hi) The discrete frequency spectra of radiation by excited atomic gases, known since the 1600s, 
could not be explained by classical physics. (Applied to the planetary model of atoms, proposed by E. 
Rutherford, it predicts the collapse of electrons on nuclei in ~10" 10 s due to electric dipole radiation of 
electromagnetic waves. 10 ) Especially challenging was the observation by J. Balmer (in 1885) that the 
radiation frequencies of simple atoms may be described by simple formulas. For example, for the 
simplest atom, hydrogen, all radiation frequencies may be numbered with just two positive integers n 
and n ': 



1 _ 1 

\n n J 



(1.6) 



with 2.07xl0 16 s" 1 . The Balmer series, including the value of coo, have found its first 

explanation in the famous 1913 theory by Niels Bohr, which was a semi-phenomenological precursor 
for quantum mechanics. In this theory, CDn >n ■ is interpreted as the frequency of a photon that obeys the 
Einstein's formula (5), with its energy E n>n being the difference between two quantized (discrete) energy 
levels of the atom (Fig. 3): 

E,=E n ,-E n >0. (1.7) 



7 As a reminder, A. Einstein received his only Nobel prize (in 1 922) for exactly this work, that essentially started 
quantum mechanics, rather than for his relativity theory. 

8 For most metals, Wis between 4 and 5 electron-volts (eV), so that the threshold corresponds to /Uax = 2nc/ (Omm = 
chlW « 300 nm - approximately at the border between the visible light and ultraviolet radiation. 

9 See, e.g., SM Sec. 2.5. 

10 See, e.g., EM Sec. 8.2. 



Chapter 1 



Page 3 of 26 



Essential Graduate Physics 



QM: Quantum Mechanics 



Fig. 1.3. Electromagnetic wave radiation at 
system's transition between its two quantized 
energy levels. 



Bohr showed that the correct 11 expression for the levels (relative to the free electron energy), 

(1.8) 




and the correct value of the so-called Hartree energy 12 



m 

E n = 2hco 0 = — 
n 



f 2 \ 

e 



Aks, 



* 27.2 eV , 



o J 



(1.9) 



Hydrogen 
atom's 
energy 
levels 



Hartree 
energy 
constant 



(where e « 1.602xl0" 19 C is the fundamental electric charge, and m e « 0.91 lxlO" 30 kg is electron's rest 
mass) could be obtained, with a virtually one-line calculation, from the classical mechanics plus just one 
additional postulate. According to this postulate, the angular momentum L = m e vr of the electron 
moving on a circular trajectory of radius r about hydrogen's nuclei (i.e. proton, assumed to stay at rest), 
is quantized as 



L = hn , 



(1.10) 



where n is again the same Plank's constant (4), and n is an integer. Indeed, in order to derive Eq. (8), it 



Angular 

momentum 

quantization 



is sufficient to solve Eq. (10) together with the 2 nd Newton's law for the rotating electron, 



772,, 



4ne 0 r' 



(1.11) 



for the electron velocity v and radius r, and then plug the results into the nonrelativistic expression for 
the full electron's energy 



my 



(1.12) 



(This nonrelativistic approach to the problem is justified a posteriori by the fact the relevant energy 
scale £h is much smaller than electron's rest energy, m e c ~ 0.5 MeV.) By the way, the value of r, 
corresponding to n = 1, i.e. to the smallest possible electron orbit, 



/;? 



(e 2 /4«sj 



» 0.053 nm. 



(1.13) 



Bohr 
radius 



1 1 Besides very small corrections due to the finite ratio of the electron mass m e to that of the nuclei, and minor 
spin-orbital and relativistic effects - see Sees. 6.3 and 9.7 below. 

12 Unfortunately, another mane, "Rydberg constant" is also frequently used for either this atomic energy unit or 
its half, Eh/2 « 13.6 eV. To add to the confusion, the same term "Rydberg constant" is sometimes used for the 
reciprocal free-space wavelength (l/^o = coqIIjvc) corresponding to frequency a>o = E^Hfi. 
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and called the Bohr radius, defines the most important spatial scale of phenomena in atomic, molecular 
and condensed matter physics - as well as in chemistry and biochemistry. 

Now note that Bohr's quantization postulate (10) may be presented as the condition than an 
integer number (n) of certain waves 13 fits the circular orbit's perimeter Itvt = nX. Dividing both parts of 
this relation by X, we see that for this statement to be true, the wave number k = 2nlX of the (then 
hypothetic) de Broglie waves should be proportional to electron's momentum p = mv: 

Momentum 



vs wave 
number 



tik 



(1.14) 



(iv) The Compton effect 14 is the reduction of frequency of X-rays at their scattering on free (or 
nearly-free) electrons - see Fig. 4. 

ha' I c 



Fig. 1.4. Compton effect. 



The effect may be explained assuming that the X-ray photon also has a momentum that obeys the 
vector-generalized version of Eq. (14): 

P photon =hk=— n, (1.15) 

c 

where k is the wavevector (whose magnitude is equal to the wave number k, and direction coincides 
with that, n, of the wave propagation), and that momenta p of both the photon and the electron are 
related to their energies E by the classical relativistic formula 15 

E 2 ={ C pf +{mc 2 f. (1.16) 

(For a photon, the rest energy is zero, and this relation is reduced to Eq. (5): E = cp = chk = hco.) Indeed, 
a straightforward solution of the following system of three equations, 



ha) + m e c 2 = hco' + [(cpf +(m e c 2 ) 2 ] U2 , (1.17) 
hco hco' , 

— = cos 0 + p cos <p , (1-18) 

c c 

0 = — ^-sin#- psinp, (1-19) 
c 



13 This fact was noticed and discussed in detail in 1923 by L. de Broglie, so that instead of discussing 
wavefunctions, especially of free particles, we are still frequently speaking of de Broglie waves. 

14 This effect was observed (in 1922) and explained a year later by A. Compton. 

15 See, e.g., EM Sec. 9.3. 
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(which describe, respectively, the conservation of the full energy of the photon-electron system, and of 
two relevant Cartesian components of its full momentum, at the scattering event - see Fig. 4), yields the 
following result, 



1 



1 



1 



fico' fico m„c 



-(1-COS0), 



(1.20a) 



which is usually presented as the relation between the initial and final values of photon's wavelength X 

= 2nlk = 2nl{colc): 16 



X' = X + 



h 



mc 



•(l-cos0) 



2tt 

X + — (l-cos0), 

Me 



with ju e = 



h 



(1.20b) SB ton 



and is in agreement with experiment. 

(v) De Broglie wave diffraction . In 1927, following the suggestion by W. Elassger (who was 
excited by de Broglie's conjecture of "matter waves"), C. Davisson and L. Germer, and independently 
G. Thomson succeeded to observe diffraction of electrons on crystals (Fig. 5). Specifically, they have 
found that the intensity of the elastic reflection from a crystal increases sharply when angle 6 between 
the incident beam of electrons and crystal's atomic planes, separated by distance d, satisfies the 
following relation: 

2dsm0 = nX, (1.21) 



Bragg 
condition 



where X = 2nlk = 2Mp is the de Broglie wavelength of electrons, and n is an integer. As Fig. 5 shows, 
this is just the well-known condition 17 that the optical path difference Al = IdsmO between the de 
Broglie waves reflected from two adjacent crystal planes coincides with an integer number of X, i.e. of 
the constructive interference of the waves. 18 




Fig. 1.5. Electron scattering from a crystal 
lattice. 



16 The constant combination hlm e c = 2nl [i e , which participates in this equation, is close to 2.46x10"' m and is 
sometimes called the Compton wavelength. This term is somewhat misleading, because no wave in the Compton 
problem has such a wavelength - either before after the scattering. 

17 Frequently called the Bragg condition, due to the pioneering experiments by W. Bragg with X-ray scattering 
from crystals (that started in 1912). 

18 Later, spectacular experiments with diffraction and interference of heavier elementary particles, e.g., neutrons, 
have also been performed - see, e.g., A. Zeilinger et al, Rev. Mod. Phys. 60, 1067 (1988). Moreover, quantum 
interference between different states of truly macroscopic objects, e.g., of superconducting condensate of millions 
Cooper pairs in 10" 3 -cm-scale metallic rings, has been observed by now - see, e.g., the pioneering experiments by 
J. Freedman et al., Nature 406, 43 (2000), carried out here at Stony Brook University. 
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To summarize, all the listed effects may be explained starting from two very simple (and 
similarly looking) formulas: Eq. (5) for photons, and Eq. (15) for both photons and electrons - both 
relations involving the same Planck's constant. This might give an impression of sufficient experimental 
evidence to declare light consisting of discrete particles (photons), and, on the contrary, electrons being 
some "matter waves" rather than particles. However, by that time (the mid 1920s) physics has 
accumulated overwhelming evidence of wave properties of light, such as interference and diffraction. In 
addition, there was also a strong evidence for lumped-particle ("corpuscular") behavior of electrons. It is 
sufficient to mention the famous oil-drop experiments by R. Millikan and H. Fletcher (1909-1913) in 
that only single (and whole!) electrons could be added to an oil drop, changing its total electric charge 
by multiples of electron's charge (-e) - and never its fraction. It was apparently impossible to reconcile 
these observations with a purely wave picture, in which an electron and hence its charge need to be 
spread over the wave, so that its arbitrary part of it could be cut out using appropriate experimental 
setups. 

Thus the founding fathers of quantum mechanics faced a formidable task of reconciling the wave 
and corpuscular properties of electrons and photons - and other particles. The decisive breakthrough in 
that task has been achieved in 1926 by Ervin Schrodinger and Max Born who formulated what is now 
known as either the Schrodinger picture of nonrelativistic quantum mechanics in the coordinate 
representation, or simply as wave mechanics. I will now formulate that picture, somewhat disregarding 
the actual history of its development. 



1.2. Wave mechanics postulates 

Let us consider a spinless, 19 nonrelativistic point-like particle whose classical dynamics may be 
described by a certain Hamiltonian function H(r, p, f), 20 where r is particle's radius-vector and p is 
coordinate. 21 Wave mechanics of such Hamiltonian particles may based on the following set of 
postulates 22 that are comfortingly elegant - though their final justification is given only by the agreement 
of all their corollaries with experiment. 

(i) Wavefunction and probability . Such variables as r or p cannot be always measured exactly, 
even at "perfect conditions" when all external uncertainties, including measurement instrument 
imperfection, macroscopic fluctuations of the initial state preparation, and unintended particle 
interactions with its environment, have been removed. 23 Moreover, r and p of the same particle can 



19 Actually, in wave mechanics, the spin of the described particle has not to be equal zero. Rather, it is assumed 
that the spin effects are negligible - as they are, for example, for a nonrelativistic electron moving in a region 
without an appreciable magnetic field. 

20 As a reminder, for many systems (including those whose kinetic energy is a quadratic-homogeneous function of 
generalized velocities, like mv 2 /2), //coincides with the total energy E - see, e.g., CM Sec. 2.3. 

21 Note that this restriction is very important. In particular, it excludes from our current discussion the particles 
whose interaction with environment is irreversible, for example it is the viscosity leading to particle's energy 
decay. Such systems need a more general quantum-mechanical description that will be discussed in Chapter 7. 

22 Generally, quantum mechanics, as any theory, may be built on different sets of postulates ("axioms") leading to 
the same conclusions. In this text, I will not try to beat down the number of postulates to the absolute minimum, 
not only because this would require longer argumentation, but chiefly because such attempts typically result in 
making certain implicit assumptions hidden from the reader - the practice as common as regrettable. 

23 I will imply such perfect conditions until the discussion of particle's interaction with environment, and realistic 
("physical") measurements in Chapter 7. 
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never be measured exactly simultaneously. Instead, even the most detailed description of the particle's 
state, allowed by Nature, 24 is given by a certain complex function ^(r, t), called the wavefunction, that 
generally enables only probabilistic predictions of measured values of r, p, and other directly 
measurable variables (in quantum mechanics, called observables). 

Specifically, the probability d W of finding a particle inside an infinitesimal volume dV = d 3 r is 
proportional to this volume and may be characterized by the probability density w = dW/d 3 r that in turn 
is related to the wavefunction as 



>v = |^(r,0| 2 =V*(r,t)y(r,t)., 



(1.22a) 



where sign * means the complex conjugate. As a result, the total probability of finding the particle ^ r a obablllty 
somewhere inside a volume Vmay be calculated as 



wavefunction 



W = \wd*r = $W* y ¥d 3 r. 



(1.22b) 



(ii) Observables and operators . To each observable A, quantum mechanics associates a certain 

linear operator A , such that, in the perfect conditions mentioned above, the average measured value 
(also called the expectation value) of A is expressed as 



(1.23) 



where (...) means the statistical average, i.e. the result of averaging the measurement results over a large 
ensemble (set) of macroscopically similar experiments. Note that for Eqs. (22) and (23) to be 
compatible, the identity ("unit") operator I , defined by relation 




Observable's 

expectation 

value 



(1.24) 



Identity 
operator 



has to be associated with a particular type of measurement, namely with particle's detection. 

(iii) Hamiltonian operator and the Schrodinger equation . Another particular operator, the 

Hamiltonian H, whose observable is the particle's energy E, also plays in wave mechanics a very 
special role, because it participates in the Schrodinger equation, 

(1.25) 

that determines wavefunction's dynamics, i.e. its time evolution. 

(iv) Radius-vector and momentum operators . In the coordinate representation accepted in wave 
mechanics, the (vector) operator of particle's radius-vector r just multiples the wavefunction by this 
vector, while the operator of particle's momentum 25 is represented by the spatial derivative: 




Schrodinger 
equation 



24 This is one more important caveat. As we will see in Chapter 7, in many cases even the Hamiltonian particles 
cannot be described by a certain wavefunction, and allow only a more general (and less precise) description, e.g., 
by the density matrix. 

25 For an electrically charged particle in magnetic field, this relation is valid for its canonical momentum - see 
Sec. 3.1 below. 



Chapter 1 



Page 8 of 26 



Essential Graduate Physics 



QM: Quantum Mechanics 



-ifiV 



Operators of 

coordinate and where V is the del (or "nabla") vector operator. 26 Thus in the Cartesian coordinates, 

momentum 




(1.26a) 



(1.26b) 



(v) Correspondence principle . In the limit when quantum effects are insignificant, e.g., when the 
characteristic scale of action S 27 (i.e. the product of the relevant energy and time scales of the problem) 
is much larger than Planck's constant h, all wave mechanics results have to tend to those given by 
classical mechanics. Mathematically, the correspondence is achieved by duplicating the classical 
relations between observables by similar relations between the corresponding operators. For example, 
for a free particle, the Hamiltonian (that in this case corresponds to the kinetic energy alone) has the 
form 





» 2 

H = ^ = 


-fv. 




2m 


2m 


Free 







particle s so that, taking into account Eq. (26b), in the Cartesian coordinates, 

Hamiltonian 




(1.27a) 



(1.27b) 



Even before a discussion of physics of the postulates (offered in the next section), let me show 
that they indeed provide a way toward the resolution of the apparent contradiction between the wave and 
corpuscular properties of particles. For a free particle, the Schrodinger equation (25), with the 
substitution of Eq. (27), takes the form 



Free 
particle's 
Schrodinger 
equation 



Plane 
wave 
solution 



Free 
particle's 
dispersion 
relation 




whose particular (but most important) solution is a plane, monochromatic wave, 28 



^(r,0 = ae 



/(k-r-of) 



(1.28) 



(1.29) 



where a, k and co are constants. Indeed, plugging Eq. (29) into Eq. (28), we immediately see the plane 
wave is indeed a solution, with an arbitrary constant a (called the probability amplitude), provided a 
specific dispersion relation between wavevector k and frequency a>: 



(1.30) 



Constant a may be calculated, for example, assuming that solution (29) is extended over a certain 
volume V, while beyond it, *P = 0. If the particle is certainly inside that volume, according to Eq. (22) 




26 See, e.g., Sees. 8-10 of the Selected Mathematical Formulas appendix (below, referred to as MA). 

27 See, e.g., CM Sec. 10.3. 

28 See, e.g., CM Sec. 7.7 and/or EM Sec. 7.1. 
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the integral of *F *F over this volume should equal 1 
using Eq. (29), we get 29 



the so-called normalization condition. From here, 



T = l. 



(1.31) 



Now we can use Eqs. (23), (26) and (27) to calculate the expectation value of particle's 
momentum p and energy E (which, for a free particle, coincides with its Hamiltonian function H), The 
result is 



p) = hk , 



E) = H) = 



(My 

2m 



(1.32) 



according to Eq. (30), the last equality may be rewritten as (E) = hco. 

Next, Eq. (23) enables one to calculate not only the statistical average (in the math speak, the 
first moment) of an observable, but also its higher moments, notably the second moment (in physics, 
usually called either the variance or dispersion): 



a>) s ((a-{a)J)={a>)-(a 



(1.33) 



Observable's 
variance 



and hence its root mean square (r.m.s.) fluctuation, 



SA = (A 



1/2 



H 34"l Observable's 
^ * ' uncertainty 



that characterizes the scale of deviations^ = A -(A) of measurement results from the average, i.e. the 

uncertainty of observable A. In application to wavefunction (29), these relations yield SE = 0, cSp = 0, 
while the particle coordinate r (at V — > oo) is completely uncertain. This means that in the plane -wave, 
monochromatic state (29), the energy and momentum of the particle are exactly defined, so that the 
signs of statistical average in Eqs. (32) might be removed. Thus, these relations are reduced to the 
experimentally-inferred Eqs. (5) and (15), though the relation of frequency co of wavefunction' s 
evolution in time to experimental observations still has to be clarified. 

Hence the wave mechanics postulates may indeed explain the observed wave properties of 
nonrelativistic particles. (For photons, we would need a relativistic formalism - see Ch. 9 below.) On 
the other hand, due to the linearity of the Schrodinger equation (25), any sum of its solutions is also a 
solution - the so-called linear superposition principle. For a free particle, this means that a set of plane 
waves (29) is also a solution of this equation. Such sets, with close values of k and hence p = hk (and, 
according to Eq. (30), of co as well), may be used to describe spatially localized "pulses", called wave 
packets — see Fig. 6. In Sec. 2.1, I will prove (or rather reproduce H. Weyl's proof :-) that the wave 
packet extension & in any direction (say, x) is related to the width Sk x of the corresponding component 
of its wave vector distribution as Sx5k x > Vi, and hence, according to Eq. (15), to the width 8p x of the 
momentum component distribution as 



&c ■ bp x 



> 



h 



(1.35) 



Heisenberg's 

uncertainty 

relation 



29 For infinite space (V— > °o), Eq. (31) yields a — > 0, i.e. wavefunction (29) vanishes. This formal problem may be 
readily resolved considering sufficiently long wave packets - see Sec. 2.2 below. 
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\the particle is 
(somewhere :-) 



here! 



at 




(b) 



pjh 



Fig. 1.6. (a) Snapshot of a typical wave packet 
propagating along axis x, and (b) the corresponding 
distribution of wave numbers k x , i.e. momenta p x . 



This is the famous the famous Heisenberg's uncertainty principle, which quantifies the first 
postulate's point that coordinate and momentum cannot be defined exactly simultaneously. However, 
since the Planck's constant is extremely small on the human scale of things, it still allows for the 
particle's localization in a very small volume even if the momentum spread in the wave packet is also 
small on that scale. For example, according to Eq. (35), a 0.1% spread of momentum of a 1 keV electron 
(p ~ 1.7xl0" 24 kg-m/s) allows a wave packet to be as small as ~3xl0" 10 m. (For a heavier particle such as 
a proton, the packet would be even tighter.) As a result, wave packets may be used to describe particles 
that are point-like from the macroscopic point of view. 

In a nutshell, this is the main idea of the wave mechanics, and the first part of this course 
(Chapters 1-3) will be essentially a discussion of various manifestations of this approach. During this 
discussion, we will not only evidence wave mechanics' many triumphs within its applicability domain, 
but will also gradually accumulate evidence for its handicaps, which force the eventual transfer to a 
more general formalism - to be discussed in Chapter 4 and beyond. 



1.3. Postulates' discussion 

The postulates listed in the previous section look very simple, and they are hopefully familiar to 
the reader from his or her undergraduate studies. However, the physics of these axioms are very deep, 
they lead to several counter-intuitive conclusions, and their in-depth discussion requires solutions of 
several key problems using these axioms. This is why in this section I will give only an initial, 
admittedly superficial discussion of the postulates, and will be repeatedly returning to the conceptual 
foundations of quantum mechanics throughout the course, especially in Sees. 7.7, 10.1, and 10.2. 

First of all, the fundamental uncertainty of observables, which is in the core of postulate (i), is 
very foreign to the basic ideas of classical mechanics, and historically has made quantum mechanics so 
hard to swallow for many star physicists, notably including A. Einstein - despite his 1905 work which 
essentially launched the whole field! However, this fact has been confirmed by numerous experiments, 
and (more importantly) there have not been a single confirmed experiment which would contradict to 
this postulate, so that quantum mechanics was long ago promoted from a theoretical hypothesis to the 
rank of a reliable scientific theory. 

One more remark in this context is that Eq. (25) itself is deterministic, i.e. conceptually enables 
an exact calculation of wavefunction's distribution in space at any instant t, provided that its initial 
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distribution, and particle's Hamiltonian, are known exactly. In classical kinetics, the probability density 
distribution w(r,t) may be also calculated from deterministic differential equations, e.g., the Fokker- 
Planck equation or the Boltzmann equation. 30 The quantum-mechanical description differs from those 
situations in two important aspects. First, in the perfect conditions outlined above (exact initial state 
preparation, no irreversible interaction with environment, the best possible measurement), the Fokker- 
Planck equation reduces to the 2 nd Newton law, i.e. the statistical uncertainty disappears. In quantum 
mechanics this is not true: the quantum uncertainly, such as Eq. (35), persists even in this limit. Second, 
the wave function *F(r, t) gives more information than just w(r, t), because besides the modulus of x ¥, 
involved in Eq. (22), this complex function also has phase q> = argT, and may affect some observables, 
describing, in particular, the interference and diffraction of the de Broglie waves. 

Next, it is very important to understand that the relation between the quantum mechanics to 
experiment, given by postulate (ii), necessarily involves another key notion: that of the corresponding 
statistical ensemble. Such ensemble may be defined as a set of many experiments carried out at 
apparently (macroscopically) similar conditions, which nevertheless may lead to different measurement 
results (outcomes). Indeed, the probability of a certain (n-th) outcome of an experiment may be only 
defined for a certain ensemble, as the limit 



M N 
^-lim^-f, withM^JX 

M ri 



(1.36) 



Definition 
of 

probability 



where M is the total number of experiments, M„ is the number of outcomes of the n-th type, and TV is the 
number of different outcomes. It is clear that a particular choice of an ensemble may affect probabilities 
W n very significantly. 

For example, if we pull out playing cards at random from a pack of 52 different cards of 4 suits, 
the probability W„ of getting a certain card (e.g., the queen of spades) is 1/52. However, if cards of a 
certain suit (say, hearts) had been taken out from the pack in advance, the probability of getting the 
queen of spades is higher, 1/39. It is important that we would also get the last number for probability 
even if we had used the full 52-card pack, but by some reason ignored results of all experiments giving 
us any rank of hearts. 

Similarly, in quantum mechanics, the probability distributions (and hence expectation values of 
particle coordinate and other observables) depend not only on the experiment setup, but also on the set 
of outcomes we count. Because of the fundamental relation (22) between w and x ¥, this means the 
wavefunction also depends on those factors, i.e. on both the experiment set preparation and the subset of 
outcomes taken into account. The insistence on the attribution of the wavefunction to a single 
experiment, both before and after the measurement, may lead to very unphysical interpretations of some 
experiments, including wavefunction 's evolution not described by the Schrodinger equation (the so- 
called wave packet reduction), subluminal action on distance, etc. Later in the course we will see that 
minding the statistical nature of the quantum mechanics, and in particular the dependence of the 
wavefunction on statistical ensemble's specification, may readily explain some apparent paradoxes of 
quantum measurements. 

Let me also emphasize that statistics is intimately related to the information theory - and not only 
via their common mathematical background, the probability theory. For example, the question, "What 
subset of experimental results we will count?" may be replaced by the question, "What subset of results 



See, e.g., SM Sees. 5.8 and 6.2, respectively. 
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of statistical 
average 



will we use information about?" As a result, the reader has to be prepared to the use of information 
theory notions for the discussion quantum mechanics, or at least its relation to experiment - i.e. to the 
"physical reality". This feature of quantum mechanics makes some physicists uncomfortable, because 
much of classical mechanics and electrodynamics may be discussed without any reference to 
information. In quantum mechanics (as in statistical mechanics), such an abstraction is impossible. 

Proceeding to postulate (ii) and in particular Eq. (23), a better feeling of this definition may be 
obtained by its comparison with the general definition of the expectation value (i.e. the statistical 
average) in the probability theory. Namely, let each of TV possible outcomes in a set of M 
macroscopically similar experiments give a certain value A n of observable A; then 



(1.37) 



Taking into account Eq. (22), which relates W and *P, the structure of Eq. (23) and the final form of Eq. 
(37) is similar. Their exact relation will be further discussed in Sec. 4.1. 




1.4. Continuity equation 

The wave mechanics postulates survive one more sanity check: they satisfy the natural 
requirement that the particle does not appear or vanish in the course of the quantum evolution. 31 Indeed, 
let us use Eq. (22) to calculate the rate of change of the probability Wto find the particle within a certain 
volume V: 



dW 



dt dt 



(1.38) 



Assuming for simplicity that the boundaries of volume V do not move, it is sufficient to carry out the 
partial differentiation of the product inside the integral. Using the time-dependent Schrodinger 

equation (25), together with its complex conjugate, 

* 



m— = {m>) , 

dt 



we get 



dW 
dt 



= f"f' 



VJ/VJ/ 



-i 



V 



dt dt 



d 3 r = ^$ ^*(ht)-t(ht)* 



(1.39) 



d 3 r. (1.40) 



Let the particle move in a field of external forces (not necessarily constant in time), so that its 
classical Hamiltonian function H is a sum of particle's kinetic energy p 12m and its potential energy U(r, 
t). 32 According to the correspondence principle, the Hamiltonian operator may be presented as the sum 33 ' 



31 Note that this requirement is not extended to the relativistic quantum theory - see Chapter 9 below. 

32 As a reminder, such description is valid not only for potential forces (in that case U has to be time- 
independent), but also for any force F(r, f) which may be presented via the gradient of U(r, f) - see, e.g., CM 
Chapters 2 and 10. (A good example when such a description is impossible is given by the magnetic component 
of the Lorentz force - see, e.g., EM Sec. 9.7, and also Sec. 3.1 of this course.) 
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P 2 



fi- 



H = -^— + U(r,t) = -— V 2 +U(r,t). 
2m 2m 



(\ A\ \ Hamiltonian 
(.1.41 ) of a particle 
in a field 



At this stage we should notice that such operator, when acting on a real function, returns a real 
function. 34 Hence, the result of its action on an arbitrary complex function V F = a + ib (where a and b are 
real) is 



m> = H(a + ib) = Ha + iHb , 

where Ha and Hb are also real, while 

(HW)* = (Ha + iHb)* = Ha- iHb = H(a - ib) = H¥* 
This means that Eq. (40) may be rewritten as 



dW 1 



dt ih 



y ¥*H x ¥- x ¥H x ¥* 



d 3 r 



111 
2m ih 



d 3 r. 



Now, let us use general rules of vector calculus 35 to write the following identity: 



V • I T - WF* ) = ^*V 2x V - W 2v F* 



A comparison of Eqs. (44) and (45) shows that we may write 

dW 



dt 



= -f(V-j)</V, 



where vector j is defined as 



2m 



m v 



(1.42) 



(1.43) 



(1.44) 



(1.45) 



(1.46) 



Probability 
(1.47) current 
density 



where c.c. means the complex conjugate of the previous expression - in this case, (YVY*)*, i.e. ^V 1 ?. 
Now using the well-known divergence theorem, 36 Eq. (46) may be rewritten as the continuity equation 



dW 
dt 



+ 1 = 0, with/ = ^j n d 2 r, 



Continuity 
(1 48) ec 1 uation: 



integral 
form 



where j„ is the projection of vector j on the outwardly directed normal to surface S that limits volume V, 
i.e. the scalar product j-n, where n is the unit vector along this normal. 

Equations (47) and (48) show that if the wavefunction on the surface vanishes, the total 
probability W of finding the particle within the volume does not change, providing the required sanity 
check. In the general case, Eq. (48) says that dWIdt equals to flux / of vector j through the surface, with 
the minus sign. It is clear that this vector may be interpreted as the probability current density - and /, as 



33 Historically, this was the main step made (in 1926) by E. Schrodinger on the background of L. de Broglie's 
idea. The probabilistic interpretation of the wavefunction was put forward, almost simultaneously, by M. Born. 

34 In Chapter 4, we will discuss a more general family of Hermitian operators, which have this property. 

35 See, e.g., MA Eq. (1 1.4a), combined with the del operator's definition V 2 = V-V. 

36 See, e.g., MA Eq. (12.2). 
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the total probability current through surface S. This interpretation may be further supported by rewriting 
Eq. (47) for a wavefunction presented in the polar form = ae l<p , with real a and (p: 

] = a 2 -V(p, (1.49) 
m 

- evidently a real quantity. Note that for a real wavefunction, or even for that with an arbitrary but space- 
constant phase <p, the probability current density vanishes. On the contrary, for the traveling wave (29), 
with a constant probability density w = a 2 , Eq. (49) yields a nonvanishing (and physically very 
transparent) result: 

j = w—k = w— = w\, (1.50) 
m m 

where v = p/m is particle's velocity. If multiplied by the particle's mass m, the probability density w 
turns into the (average) mass density p, and the probability current density into the mass flux density pv, 
while if multiplied by the total electric charge q of the particle, with w turning into the charge density a, 
j becomes the electric current density, both satisfying the classical continuity equations similar to Eq. 
(48). 37 

Finally, let us recast the continuity equation, rewriting Eq. (46) as 



rfdw „ 



d 3 r = 0. 



(1.51) 



Now we may argue that this equality may is true for any choice of volume V only if the expression 
under the integral vanishes everywhere, i.e. if 



Continuity 
equation: 
differential 
form 



dw 

~dt 



+v-j = o. 



(1.52) 



This differential form of the continuity equation is sometimes more convenient than its integral form 
(48). 



1.5. Eigenstates and eigenvalues 

Now let us discuss important corollaries of wave mechanics' linearity. First of all, it uses only 
linear operators. This term means that the operators must obey the following two rules: 38 

(4 + A 2 ) X ¥ = A^ + A 2 ^>, (1.53) 
A(c l x i J l + c 2 v F 2 )=i(c 1 v F 1 )+ A(c 2 x ¥ 2 ) = c 1 i v F 1 +c 2 A x ¥ 2 , (1.54) 



37 See, e.g., respectively, CM 7.2 and EM Sec. 4.1. 

38 By the way, if any equality involving operators is valid for an arbitrary wavefunction, the latter is frequently 
dropped from notation, resulting in an operator equality. In particular, Eq. (53) may be readily used to prove that 
the operators are commutative: A 2 + A x = A l +A 2 , and associative: [j^ + A 2 )+ A 3 = A l + [2 2 +A 3 ). 
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where *P„ are arbitrary wavefunctions, while c„ are arbitrary constants (in quantum mechanics, 
frequently called c-numbers, to distinguish them from operators and wavefunctions). Most important 
examples of linear operators are given by: 

(i) the multiplication by a function, such as for operator r in wave mechanics, and 

(ii) the spatial or temporal differentiation of the wavefunction, such as in Eqs. (25)-(27). 

Next, it is of key importance that the Schrodinger equation (25) is also linear. (We have already 
used this fact when we discussed wave packets in the last section.) This means that if each of functions 
*¥„ are (particular) solutions of Eq. (25) with a certain Hamiltonian, then an arbitrary linear combination 



is also a solution of the same equation. 39 



Now let us use the linearity of wave mechanics to accomplish an apparently impossible feat: 
immediately find the general solution to the Schrodinger equation for the most important case when 
system's Hamiltonian does not depend on time explicitly - for example, like in Eq. (27), or in Eq. (41) 
with time-independent U= U(r). First of all, let us prove that the following product, 



(1.56) 



Variable 
separation 



qualifies as a (particular) solution to the Schrodinger equation. Indeed, plugging Eq. (56) into Eq. (25), 
using the fact that for a time-independent Hamiltonian 

Ha n (t) Wn (r) = a n (t)H Wn (T), (1.57) 

and dividing both parts of the equation by = a„ y/ n , we get 

iha n _ Hy/ n 



(1.58) 



where (here and below) the dot denotes the differentiation over time. The left hand side of this equation 
may depend only on time, while the right hand one, only on coordinates. These facts may be only 
reconciled if we assume that each of these parts is equal to (the same) constant of the dimension of 
energy, which I will denote as E„. 40 As a result, we are getting two separate equations for the temporal 
and spatial parts of the wavefunction: 

iha n =E n a n , (1.59) 



H ¥n = E n¥n 



Stationary 
(1.60) Schrodinger 
equation 



The first of these equations is readily integrable, giving 



39 It may seem strange that the linear Schrodinger equation correctly describes quantum properties of systems 
whose classical dynamics is described by nonlinear equations of motion (e.g., an anharmonic oscillator - see, e.g., 
CM Sec. 4.2). Note, however, that equations of classical physical kinetics (see, e.g., SM Chapter 6) also have this 
property, so it is not specific to quantum mechanics. 

40 This argumentation, leading to variable separation, is very common in mathematical physics - see, e.g., its 
discussion in EM Sec. 2.5. 
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evolution 



Stationary 
Schrodinger 
equation 
for static 
potential 



a n = const x exp{- icoj], with co n = — 

h 



(1.61) 



and thus substantiating the fundamental relation (5) between energy and frequency. Plugging Eqs. (56) 
and (61) into Eq. (22), we see that in such a state, the probability w of finding the particle at a certain 
location does not depend on time. Doing the same with Eq. (23) shows that the same is true for the 
expectation value of any operator that does not depend on time explicitly: 



A ) n ={^M 3 >-= const. O- 62 ) 



Due to this property, the states described by Eqs. (56), (60), and (61), are called stationary. In contrast 
to the simple and universal time dependence (61), the spatial distributions ¥n (r) of the stationary states 
are often hard to find, and the solution of the stationary (or "time-independent") Schrodinger equation 
(60), 41 which describes the distributions, for various situations is a major focus of wave mechanics. 

The stationary Schrodinger equation (60), with time-independent Hamiltonian (41), 



^-V 2 ¥n +U(r) ¥n = E n¥n ., 
2m 



(1.63) 



falls into the mathematical category of linear eigenproblems, 42 in which eigenfunctions ¥ „ and 
eigenvalues E„ should be found simultaneously - self-consistently. 43 Mathematics tells us that for the 
such problems with space-confined eigenfunctions ¥n , tending to zero at r — > co, the spectrum of 
eigenvalues is discrete. It also proves that the eigenfunctions corresponding to different eigenvalues are 
orthogonal, i.e. that space integrals of the products ¥n¥ *n' vanish for all pairs with n ^n'. Moreover, 
due to the Schrodinger equation linearity, each of these functions may be multiplied by a constant 
coefficient to make this set orthonormal: 

r * , [1, if n = n\ 

Wn¥ n <d"r = S n>n ^\' (1.64) 
J [0, if n * n . 

Also, the eigenfunctions form a full set, meaning that an arbitrary function yXj), in particular the actual 
wavefunction *F of the system in the initial moment of its evolution (which I will take for t = 0, with a 
few exceptions), may be presented as a unique expansion over the eigenfunction set: 

¥(r,0) = 2>^„(r). (1.65) 

n 

The expansion coefficients c* may be readily found by multiplying both parts of Eq. (65) by ys* n ; 
integrating the result over the space, and using Eq. (64). The result is 

c„ = \ ¥ * n {r)V{rfi)d'r. (1.66) 
Now let us consider the following wavefunction 



41 In contrast, the initial Eq. (24) is frequently called the time-dependent or nonstationary Schrodinger equation. 

42 From German root eigen meaning "particular" or "characteristic". 

43 Eigenvalues of energy are frequently called eigenenergies , and it is often said that eigenfunction y/ n and 
eigenenergy E„ together characterize n-th stationary eigenstate of the system. 
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¥(r,0 = Xc„%(0^(r) = X^^(r)exp|- A j . (1.67) 

Since each term of the sum has the form (56) and satisfies the Schrodinger equation, so does the sum as 
the whole. Moreover, if coefficients c n are derived in accordance with Eq. (66), then solution (67) 
satisfies the initial conditions as well. At this moment we can again use one more help by 
mathematicians who tell us that the partial differential equation of type (28) with the Hamiltonian 
operator (41) with fixed initial conditions, may have only one {unique) solution. This means that in our 
case of motion in a time-independent potential U = U(r), Eq. (67) gives the general solution of the time- 
dependent Schrodinger equation (25) for our case: 

{fit f, 2 

ih = V 2 ¥ + t/(r)¥. (1.68) 

dt 2m 

We will repeatedly use this key fact through the course, though in many cases, following the physical 
sense of particular problems, will be more interested in certain specific particular solutions of Eq. (68) 
rather in the whole linear superposition (67). 

In order to get some feeling of functions y/„, let us consider perhaps the simplest example, which 
nevertheless will be the basis for discussion of many less trivial problems: a particle confined in a 
rectangular quantum well with a flat "bottom" and sharp and infinitely high "hard walls": 

fO, for 0 < x < a v , 0<y<a„, and0<z<<2, 
U(r) = \ x y y z (1.69) 

[ + oo, otherwise. 

The only way to keep the product Uy/ n in Eq. (68) finite outside the well, is to have \|/ = 0 in these 
regions. Also, the function have to be continuous everywhere, to avoid the divergence of its Laplace 
operator. Hence, we may solve the stationary Schrodinger equation (63) only inside the well, where it 
takes a simple form 44 

-^ 2 ¥n =E n¥n , (1.70a) 
2m 

with zero boundary conditions on all the walls. For our particular geometry, it is natural to express the 
Laplace operator in the Cartesian coordinates {x, y, z) aligned with the well sides, so that we get the 
following boundary problem: 



2m 



( d 2 d 2 d 2 ^ 

— Y + — Y + — 2 
ydx by dz j 



w n = Ew n , for 0 < x < a Y , 0 < y < a v , and 0 < z < a 7 , 

y (1.70b) 



y/ n =0, for x = 0 and a x ; y = 0 and a ; z = 0 and a 



This problem may be readily solved using the same variable separation method which was used 
earlier in this section to separate the spatial and temporal variables, now to separate Cartesian spatial 
variables from each other. Let us look for a particular solution in the form 

¥ {r) = X{x)Y{y)Z{z). (1.71) 



44 Rewritten as V 2 / + k 2 f= 0, this is the Helmholtz equation, which describes scalar waves of any nature (with 
wave vector k) in a uniform, linear media - see, e.g., CM Sec. 5.5 and/or EM Sees. 7.7-7.9. 
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(It is convenient to postpone taking care of proper indices for a minute.) Plugging this expression into 
the Eq. (70b) and dividing by y/= XYZ, we get 



2m 



1 d 2 X 1 d 2 Y 1 d 2 Z 



■ + ■ 



■ + ■ 



X dx Y dy Z dz' 



(1.72) 



Now let us repeat the standard argumentation of the variable separation method: since each term 
in the parentheses may be only a function of the corresponding argument, the equality is possible only if 
each term is a constant - with the dimensionality of energy. Calling them E x , etc., we get three ID 
equations 



h 2 1 d 2 X 



= E -^^ = E 
2m Y dy 2 



2m X dx 2 

with Eq. (72) turning into the energy-matching condition 

E+E+E. 



h 2 1 d 2 Z 
2m Z dx 2 



= E. 



(1.73) 



(1.74) 



All three ordinary differential equations (73), and their solutions, are similar. For example, for 
X(x), we have a ID Helmholtz equation 



d X . 2 ^ r r\ 

—^ + k 2 X = 0, 
dx 



with k 2 = 



2mE x 



(1.75) 



Rectangular 
quantum 
well: 
partial 
eigen- 
functions 



and simple boundary conditions: X(0) = X(a x ) = 0. Let me hope that the reader knows how to solve this 
well-known ID boundary problem - describing, for example, usual mechanical waves on a guitar string, 
though with a very much different expression for k x . The problem allows an infinite number of 
sinusoidal standing-wave solutions, 45 



X 



, N 1/2 

' 2 A 



sin A: x 



r \i/2 
' 2 ^ 



7UJ X 

sin — — , with« = 1, 2 



5 ^9 • • • 5 



(1.76) 



corresponding to eigenenergies 



ft' 



e x = —K = 

2m 2ma 



2*2 

n ft 2 p 2 



(1.77) 



Rectangular 
quantum 
well: 
energy 
levels 



Figure 7 shows this result using a somewhat odd but very graphic and hence common way when the 
eigenenergy values (frequently called energy levels) are used as horizontal axes for plotting 
eigenfunctions, despite their different dimensionality. 

Due to the similarity of all Eqs. (73), Y (y) and Z(z) are similar functions of their arguments, and 
may also be numbered by integers (say, n y and n z ) independent of n x , so that the spectrum of the total 
energy (74) is 



(1.78) 




The front coefficient is selected in a way that ensures the (ortho)normality condition (64). 
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n x = 3 








n x =2 
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Fig. 1.7. Eigenfunctions (solid lines) and eigenvalues 
(dashed lines) of the ID wave equation (75) on a finite- 
length segment. Solid black lines show the potential 
energy profile of the problem. 



a x 



Thus, in this 3D problem, the role of index n in Eq. (67) is played by a set of 3 independent 
integers {n x , n y , n z ) . In quantum mechanics, such integers play a key role, and thus have a special name, 
quantum numbers. Now the general solution (67) of our simple problem may be presented as the sum 



z 



n n n 



. Tmx . nn y y . m 7 z 
c n ,n ,n sin^— sin^— sin exp 

j x y 2 a, a y a z 



n x> n v> n z 

i — -^^t 
h 



(1.79) 



Rectangular 

quantum 

well: 

general 

solution 



with the coefficients which may be readily calculated from the initial wavefunction ^(r, 0), using Eq. 
(66), again with the replacement n — » {n x , n y , n z }. This simplest problem is a good illustration of the 
basic features of wave mechanics for a spatially-confined motion, including the discrete energy 
spectrum, and (in this case, evidently) orthogonal eigenfunctions. 

An example of the opposite limit of a continuous spectrum for unconfined motion of a free 
particle is given by plane waves (29) which, with the account of relations E = hco and p = hk, may be 
viewed as the product of the time-dependent factor (46) by eigenfunction 



y/ k = a k expjz'k -r 



(1.80) 



Free 
particle: 
eigen- 
functions 



that is the solution to the stationary Schrodinger equation (70a) if it is valid in the whole space. 46 

The reader should not be worried too much by the fact that the fundamental solution (80) in free 
space is a traveling wave (having, in particular, nonvanishing value (50) of the probability current j), 
while those inside a quantum well are standing waves, with j = 0, even though the free space may be 
legitimately considered as the ultimate limit of a quantum well with volume V = a x aya z — > <x>. Indeed, 
due to the linearity of wave mechanics, two traveling-wave solutions (80) with equal and opposite 
values of momentum (and hence with the same energy) may be readily combined to give a standing- 
wave solution, for example exp{/k-r} + exp{-z'k-r} = 2cos(k-r), with the net current j = 0. Thus, 
depending on convenience for solution of a particular problem, we can present the general solution as a 
sum of either traveling-wave or standing-wave eigenfunctions. 

Since in the free space there are no boundary conditions to satisfy, Cartesian components of the 
wave vector k in Eq. (80) can take any real values. (This is why it is more convenient to label the 
wavefunctions and eigenenergies, 



46 In some systems (e.g., a particle interacting with a finite-depth quantum well), a discrete energy spectrum 
within a certain interval of energies may coexist with a continuous spectrum in a complementary interval. 
However, the conceptual philosophy of eigenfunctions and eigenvalues remains the same in this case as well. 
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Free 
particle: 
eigen- 
energies 




(1.81) 



by their wave vector k rather than an integer index.) However, one aspect of systems with continuous 
spectrum requires a bit more math caution: summation (67) should be replaced by integration over a 
continuous index or indices (in this case, 3 components of vector k). The main rule of such replacement 
may be readily extracted from Eq. (76): according to this relation, for standing-wave solutions, the 
eigenvalues of k x are equidistant, i.e. separated by equal intervals Ak x = nla x (with the similar relations 
for other two Cartesian components of vector k). Hence the number of different eigenvalues of the 
standing wave vector k (with k x , k y , k z > 0), within a volume d k » VV of the k space is just dN = 
d 3 kl(Ak x Ak x Ak x ) = VI 7? '. Since in continuum it is more convenient to work with traveling waves, we 
should take into account that, as was just discussed, there are two different traveling wave vectors (k 
and k' = -k) corresponding to each standing wave vector k. Hence the same number of physically 
different states corresponds to 2 3 = 8-fold larger k space (which now is infinite in all directions) or, 



equivalently, to a smaller number of states per unit volume d k: 



3D 
number 
of states 




(1.82) 



For dN » 1, this expression is independent on the boundary conditions, 47 and is frequently 
presented as the following summation rule 



Summation 
over 
3D states 



Z/0O = \fQMN = j£y\f(W 3 k 



(1.83) 



where /(k) is an arbitrary function of k. This rule is very important for statistical physics. Note also that 
if the same wave vector k corresponds to several internal quantum states (such as spin - see Chapter 4), 
the right-hand part of Eq. (83) requires multiplication by the corresponding degeneracy factor. 



1.6. Dimensionality reduction 

To conclude this introductory chapter, let me discuss the conditions when the spatial 
dimensionality of a wave mechanics problem may be reduced. 48 For example, following our discussion 
of the 3D rectangular, flat-bottom quantum well in Sec. 5, let us consider an infinitely deep quantum 
well whose bottom is flat only in one direction, say z: 

[U (x, y), for 0 < z < a, , 
U(r) = \ y yh 7 (1.84) 

[ + oo, otherwize. 

In this case, we can separate variables only partly, by presenting the eigenfunction as yAx,y)Z(z). 
Plugging such solution into the corresponding form of the stationary Schrodinger equation (63), we see 



47 For a more detailed discussion of this point, the reader may be referred, e.g., to CM Sees. 5.4 (in the context of 
ID mechanical waves), because it is valid for waves of any nature. 

48 Many textbooks on quantum mechanics jump to solution of ID without such discussion, and most of my 
beginning graduate students did not understand that in realistic physical systems, such dimensionality restriction 
is only possible under very specific conditions. 
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that functions Z(z) are again similar to those given by Eq. (76), while function \jAx,y) satisfies the 
following 2D stationary Schrodinger equation: 

2D 

fl 2 ^ 9 stationary 

(1.85) Schrodinger 



2m 



where 



U e{ (x, y) = U (x, y) + E z = U(x, y) + 



2*2 2 

n n n z 
Imal 



equation 



Effective 
(1.86) potential 
energy 



Thus, we have arrived at the boundary problem similar to the initial one, but with the spatial 
dimensionality reduced from 3 to 2, due to what is called the partial quantum confinement in direction z. 
In addition, if all partial functions Z(z) are normalized to unity, the wavefunction normalization 
condition becomes 



W = jV(x, y)w * (x, y)ckdy , 



(1.87) 



where A is the total area of the system on the [x, y] plane, and is formally similar to the initial 3D 
normalization condition. However, the effective 2D potential energy U e ^x,y) includes term E z depending 
on quantum number n z , 49 making the physical relevance of such variable separation much less general 
than might be naively expected. There are three possible cases: 

(i) If there is no strong relation between the energy scale E XnV of potential U e ^x,y) and E z , the 
solution of a typical problem has to be presented as a (typically, large) sum of partial solutions 
yAx,y)Z(z), each with its own n z , U e f, and E z . In this general case, the variable separation may not 
provide much relief at all, because eigenenergies of solutions with different n z may be close, so that 
several of them would simultaneously participate in realistic processes. 

(ii) E z is much smaller than E x>y and may be neglected. This may be the case, for example, if the 
potential profile is more steep along axes x and y, than along direction z. Notice, however, that 
condition, a z — > oo, does not guarantee the smallness of E z , because it may be compensated by large 
values of n z . In this case (typical for solid state problems), either summation or integration over n z still 
may be needed, though sometimes may be carried out analytically, because functions Z(z) are simple 
sinusoidal waves. 

(iii) Counter-intuitively, the most robust dimensionality reduction is possible in the opposite 
limit, when a z is much smaller than the characteristic scale of motion within the [x, y] plane (Fig. 8a). 
Indeed, in this case the distance between adjacent levels of the quantum confinement energy E z is much 
larger than the characteristic energy E x _ y of motion within the plane. As a result, if the system was 
initially prepared to be on the lowest, ground level of E z , , a "soft" motion along x and y cannot excite 
the system to higher levels of E z . 50 Hence, the system keeps the fixed quantum number n z = 1, through 



49 The last term in Eq. (86) is frequently referred to as the (partial) quantum confinement energy; despite its 
inclusion to U tf , it is important to remember about the kinetic-energy origin of this contribution. 

50 In the frequent case when motion in the [x, y] plane is free (or almost free), the set of quantum states with the 
same quantum number n z is frequently called a subband, because their energies form a (quasi-) continuum of 
eigenenergies E xy . 
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the motion, so that the confinement energy E z is constant and, according to Eq. (86), may be treated just 
as a fixed potential energy offset. 

The last conclusion is true even if the quantum well's profile in direction z is not rectangular 
(provided that E z is still much larger than E xy ). For example, many 2D quantum phenomena, such as the 
quantum Hall effect, 51 have been studied experimentally using electrons confined at semiconductor 
heteroj unctions (e.g., epitaxial interfaces GaAs/Al x Gai_ x As) where the potential well in the direction 
perpendicular to the interface has a nearly triangular shape, with the splitting of energies E z is the order 
of 10" 2 eV. 52 This splitting energy corresponds to k B T at temperature -100 K, so that careful 
experimentation at liquid helium temperatures (4K and below) may keep the electrons performing 
purely 2D motion in the "lowest subband" (n z =1). 




Fig. 1.8. Partial quantum confinement in: (a) one dimension, and (b) two dimensions. 



Now, if a quantum well is formed in two dimensions (say, y and z, see Fig. 8b), 53 

\U (x), for 0 < y < a and 0 < z < a z , 



U(r) 



+ <x>. 



otherwize. 



(1.88) 



then repeating the variable separation procedure we see that the 3D Schrodinger equation (68) may be 
satisfied with particular solutions of the type (71), again with sinusoidal standing waves Y(y) and Z(z), 
but generally a more complex function X{x), which has to satisfy the following ID Schrodinger equation 



1D 

stationary 
Schrodinger 
equation 



Effective 
potential 
energy 



h 2 d 2 X 
2m dx 2 



+ U ef (x)X = E x X, 



with the effective potential energy 



U e ,(x) = U(x) + E y +E z 



(1.89) 



(1.90) 



Again, if the particle stays in the lowest subband, n y = n z = 1 , both E y and E z retain their constant values 
E y \ and E z \. Repeating the above discussion of the one-dimensional partial confinement, we can expect 
that a wave mechanics problem may be substantially simplified if E y \ and E z \ are much larger than the 
energy scale E x of the motion in direction x. Namely, if: 

(i) the potential profile within the 2D partial confinement plane [y, z] is arbitrary (provided that it 
provides partial confinement scales a y and a z much smaller the spatial scale of the motion in direction x), 
and 



51 To be discussed in Sec. 3.2. 

52 See, e.g., P. Harrison, Quantum Wells, Wires, and Dots, 3 rd ed., Wiley, 2010. 

53 This is a reasonable first approximation, for example, for electron motion potential in so-called quantum wires, 
for example in the now-famous carbon nanotubes - see, e.g., the same monograph by P. Harrison. 
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(ii) the potential energy U is either constant in time or changes relatively slowly, at a time scale r 
» tilEyzi (where E yz \ is the lowest eigenenergy of motion within the [y, z] plane), 

then a large range of experiments may be adequately described by looking for solution of the general 
(time-dependent, 3D) Schrodinger equation in the form of the following product 

^{x,t)YZ x {y,z) Q x V \-i^t\, (1.91) 



where YZ\ is the lowest (ground-state) eigenfunction of the 2D problem in the [y, z] plane. Substituting 
this solution to the equation, and separating variables {y, z) from {x, t}, we obtain the following time- 
dependent, ID equation 



ih 



&¥(x,t) h 2 d 2x V(x,t) 



dt 



2m dx' 



+ U(x,t) x ¥(x,t). 



(1.92) 



1D time- 
dependent 
Schrodinger 
equation 



The next chapter will be devoted to a detailed discussion of the wave mechanics described by 
this ID equation, because it allows to study most basic phenomena and concepts of wave mechanics 
without involving overly complex math. In that chapter, for the notation simplicity, energy E x ID 
motion will be referred to just as E. However, one should always remember that each "ID problem" has 
two hidden degrees of freedom and that the genuine energy of the particle also includes a constant shift 
Eyz\ which is typically much larger than E x . The Universe is (at least :-) 3 -dimensional, and it shows! 

Finally, note that in systems with reduced dimensionality, Eq. (82) for the number of states at 
large k (i.e., for an essentially free particle motion) should be replaced accordingly: in a 2D system of 
area ,4 » Ilk 2 , 




while in a ID system of length / » Ilk, 



dN = —dk, 
2n 



(1.93) 



(1.94) 



2D number 
of states 



1D number 
of states 



with the corresponding changes of the summation rule (83). This change has important implications for 
the density of states on the energy scale, dN/dE: it is straightforward (and hence left for the reader :-) to 
use Eqs. (82), (93), and (94) to show that for free 3D particles the density increases with E 

1/2 

(proportionally to E ), for free 2D particles it does not depend on energy, while for free ID particles it 



1/2 

scales as E u \ i.e. decreases with energy. 



1.7. Exercise problems 

1.1 . Prove that the quantum-mechanical uncertainties of the momentum and kinetic energy in the 
monochromatic, plane- wave state, given by Eq. (1.29) of the lecture notes, equal zero at any time. 



1.2 . Use Eq. (1.53) of the lecture notes to prove that linear operators of quantum mechanics are 



commutative, A 2 +A l =A l +A 2 , and associative, (i, + A 2 )+ A i =A x + [A 2 +A 3 ). 



Chapter 1 



Page 24 of 26 



Essential Graduate Physics 



QM: Quantum Mechanics 



1.3 . Calculate (x), (p x ), Sx, and 5p x for eigenstate {n x , n y , n z \ of a hard, infinitely deep quantum 
well (1.69). Compare product SxSp x with Heisenberg's uncertainty relation. 

1.4 . AID particle in a rectangular, infinitely deep quantum well, 



Find wavefunction ^(xj) for arbitrary t > 0. 

1.5 . Find the potential profile U(x) for which the following wavefunction, 

*P = c exp{- ax 2 - ibt), 

(with real coefficients a > 0 and b), satisfies the Schrodinger equation for a particle with mass m. 
Calculate (x), (p x ), Sx, and Sp x , and compare the product &cdp x with Heisenberg's uncertainty relation. 

1.6 . Calculate the energy density dNIdE of different traveling wave states in a large rectangular 
quantum well of various dimensions: d= 1,2, and 3. 




0, for0<x<a, 
+ oo, otherwise, 



is initially put into the following state 



v F(x,0) = Csin 



3 7TX 



a 
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Chapter 2. ID Wave Mechanics 

The main goal of this chapter is the solution and discussion of a few conceptually most important 
problems of wave mechanics for the simplest, ID case. This lowest dimensionality, and a wide use of 
potential profiles ' approximation by sets of Dirac 's delta functions, simplify the necessary calculations 
considerably without sacrificing the physical essence of the described phenomena. The reader is 
advised to pay special attention to Sections 6-9, which cover some important material not usually 
discussed in textbooks 



2.1. Probability current and uncertainty relations 



As was discussed in the end of Chapter 1, in several cases (most importantly, at strong quantum 
confinement within the [y, z] plane), the general (3D) Schrodinger equation may be reduced to the ID 
equation (1.92): 



Schrodinger 
equation 



.^bW{x,t) h 2 d lx ¥{x,t) 



iti 



dt 



2m dx' 



+ U(x,t) x ¥(x,t). 



(2.1) 



Probability 



If the transversal factor - say, the function YZ\ (y, z) that participates in Eq. (1.91), is normalized to 
unity, then the integration of Eq. (1.22a) over a segment [xi, X2], gives the probability to find the 
particle on this segment: 



(2.2) 



If the particle under analysis is definitely inside the system, the normalization of its ID wavefunction 
*¥(x, t) is provided by extending integral (2) to the whole axis x: 




Normalization 



^w(x,t)dx = \, where w(x, t) = ¥(x, t)¥ * (x, t) . 



(2.3) 



Expectation 
value 



Probability 
current 



A similar integration of Eq. (1.23) shows that the expectation value of any operator depending only on 
coordinate x (and possibly time), may be expressed as 



(2.4) 




It is also useful to introduce the probability current along the x-axis (a scalar): 



r h 

I(x,t) = j x dydz = — Im 
J m 



* d 
¥ — ¥ 

dx 



m 



dx 



(2.5) 



where j x is x-component of the probability current density vector j(r,?). Then the continuity equation 
(1.48) for the segment [xi, X2] takes the form 



Continuity 
equation 



dW 
dt 



+ I(x 2 )-I(x l ) = 0. 



(2.6) 
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The above formulas are the basis for the analysis of ID problems of wave mechanics, but before 
proceeding to particular cases, let me deliver on my earlier promise to prove that Heisenberg's 
uncertainty relation (1.35) is indeed valid for any wavefunction ^(xj). For that, let us consider an 
evidently positive (or at least non-negative) integral 



xT + X 



d*¥ 

dx 



dx>0. 



(2.7) 



where X is an arbitrary real constant, and assume that at the at x — »±oo the wavefunction vanishes, 
together with its first derivative. The left-hand part of Eq. (7) may be recast as 



I 



x*¥ + A 



d¥ 

dx 



dx= UxW + X 



xT + A 



dx , 

= Ix 2 ^ dx + X\x\ x i J ^^ + — x ¥ dx + X 2 \ 
J J 1 dx dx J J 



d¥ 

dx 



dx 



dx dx 



(2.8) 



dx. 



According to Eq. (4), the first term in the last form of Eq. (8) is just (x ). The second and the third 
integrals may be worked out by parts: 



■L V dx dx 



+00 ?S f \ X=+CO , v +00 

dx= Jx— \^ x ¥*)dx= jxdi^ x ¥*)=W x ¥*x X x l + ^- $W x ¥*dx = -l, (2.9) 



■dV dW , TW „,* d*F m 
-err = T 



\™^dx= f 



* Ot ...* 

3x fix 



_ uu fix dx 

As a result, Eq. (7) takes the following form: 



X = +00 
X = -oo 



J T *0*= » J T *#P,fe = ^. (2.10) 
fix /r J ft 



j(A) = (x 2 )-A + A 2 ^L>0, i.e. X 2 + aX + b > 0, with a = -t-tt- and A = 



ft 2 (x 2 



(2.11) 



This inequality should be valid for any real A, i.e. the corresponding quadratic equation, /I 2 + a/l + b = 0, 
can have either one (degenerate) real root - or no real roots at all. This is only possible if its determinant, 
Det = a - 4b, is non-positive, leading to the following requirement: 



x 2 )p 2 )> 



tr 



(2.12) 



In particular, if <x> = 0 and (p x ) = 0, 1 then according to Eq. (1.33), Eq. (12) takes the form 




Heisenberg's 
(2.13) uncertainty 
relation 



which, according to the definition (1.34) of r.m.s. uncertainties, is equivalent to Eq. (1.35). 



1 Eq. (13) may be proved even if (x) and (p x ) are not equal to zero, by making the following replacements, x — > x - 
(x), d/dx — > d/dx + i(p)l%, in Eq. (7), and then repeating all the calculations - which become rather bulky. We will 
re-derive the uncertainty relations, in a more efficient way, in Chapter 4. 
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Now let us notice that the Heisenberg's uncertainty relation looks very similar to the 
commutation relation between the corresponding operators: 

[x s p x \¥ = (xp x -p x x}¥ = -ihx — + ih — {x x ¥) = im> . (2.14a) 

Commutator &X dx 

of coordinate 

and Since this relation is valid for arbitrary wavefunction ^(x, t), we may present it as an operator equality: 

momentum I 1 

operators [x, p x ] = ih*0. (2. 14b) 



In Sec. 4.5 we will see that the relation between Eqs. (13) and (14) is just a particular case of a general 
relation between the expectation values of non-commuting operators and their commutators. 



2.2. Free particle: Wave packets 

Let us start our discussion of particular problems with free the ID motion, with U(x,f) = 0. From 
our discussion of Eq. (1.29) in Chapter 1, it is clear that in the ID case, a similar "fundamental" (i.e. a 
particular but the most important) solution of the Schrodinger equation (1) is a monochromatic wave 

%(x,t) = const x /V-«V) _ ( 2.15) 

According to Eqs. (1.32), it corresponds to a particle with an exactly defined momentum 2 po = Mo and 

2 2 

energy E 0 = h(o 0 = h ko 12m. However, for this wavefunction, product V P* V P does not depend on either x 
or t, so that the particle is completely delocalized, i.e. its probability is spread all over axis x, at all 
times. (As a result, such state is still compatible with Heisenberg's uncertainty relation (13), despite the 
exact value po of momentum p.) 

In order to describe a space-localized particle, let us form, at the initial moment of time (t = 0), a 
wave packet of the type shown in Fig. 1.6, by multiplying the sinusoidal waveform (15) by some smooth 
envelope function of x. As the most important particular example, consider a Gaussian packet 



Initial 
Gaussian 
wave 
packet 



¥(*,()) = ^ ^exp e ik ° X 

(2nf\dx) m F [ {28xf { 



(2.16) 



(By the way, Fig. 1.6 shows exactly such a packet.) The pre-exponential factor in Eq. (16) has been 
selected in the way to have the initial probability density, 

w(x,0) = W*(x,0)W(x,0) = ) exp{--^-4 , (2.17) 

(2n) Sx [ 2(8xf J 

normalized according to Eq. (3), for any parameters dx and k 0 . 3 

In order to explore the evolution of this packet in time, we could try to solve Eq. (1) with the 
initial condition (16) directly, but in the spirit of the discussion in Sec. 1.5, it is easier to proceed 



2 From this point on, in this chapter I will drop index x in notation for x-component of vectors k and p. 

3 This may be readily proven using the well-known integral of the Gaussian function ("bell curve") given by Eq. 
(17) - see, e.g., MA Eq. (6.9b). It is also straightforward to use MA Eq. (6.9c) to prove that for wave packet (16), 
parameter <Sc is indeed the r.m.s. uncertainty (1.34) of coordinate x, thus justifying its notation. 
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differently. Let us first present the initial wavefunction (16) as a sum (1.65) of eigenfunctions y/iix) of 
the corresponding stationary ID Schrodinger equation (1.60), in our current case 



^ d 2 ¥k 
2m dx 2 

that are simply monochromatic waves, 



2 i 2 



= E kVk> with ^ = 



h l k 

2m 



W k = a k e 



Jkx 



(2.18) 



(2.19) 



with a continuum spectrum of possible wave numbers k. For that, sum (1.65) should be replaced with an 
integral: 4 



¥(*,()) = \a k y/ k {x)dp =^a k e ikx dk 



(2.20) 



Now let us notice that from the point of view of mathematics, Eq. (20) is just the usual Fourier 
transform from variable k to the "conjugate" variable x, and we can use the well-known formula of the 
reciprocal Fourier transform to calculate 



a k =^^ x ¥(x,0)e~ ibc dx= 1 



1 



2;r {2xf\5x) 



.1/2 



exp 



(25x) : 



-ikx\dx, where k = k — k 0 , (2.21) 



This Gaussian integral may be worked out by the following standard method. Let us complement the 
exponent to the full square of a linear combination of x and k, plus a term independent of x: 



— - — 7r-ikx = — ^ x + 2ik(5x) 2 ] -k 2 {dx) 2 

(25x) 2 {25xf L J 



(2.22) 



Since the integration in the right-hand part of Eq. (20) should be performed at constant k , in the infinite 
limits, its result would not change if we replace dx by dx' = d[x + 2i{5xf k ]. 5 As a result, we get, 



1 



1 



2n (2nf\dx) 



,1/2 



exp|- k 



expi- 



(2&y 



[dx' = 



f 1 > 



1/2 



\2n j 



1 



(27T) V4 (Sk) 



(2Sky 



.(2.23) 



so that at also has a Gaussian distribution, now along axis k, centered to value ko (Fig. 1.6b), with 
constant 5k defined as 



5k = \l25x. 

Thus we may present the initial wave packet (16) as 



¥(*,0) = 



J_ 

\2n j 



1 



(2ny\5k) 



.1/2 



Jexp 



(k-k 0 y 

{25kf 



\e ikx dk 



(2.24) 



(2.25) 



From comparison of this formula with Eq. (16), it is evident that the r.m.s. uncertainty of the wave 
number k in this packet is indeed equal to 5k defined by Eq. (24), thus justifying the notation. The 



4 For notation's brevity, from this point on the infinite limit signs will be dropped in all ID integrals. 

5 The fact that the argument shift is imaginary is not important, because function under the integral is analytical, 
and tends to zero at Re x ' — > ±oo. 
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comparison of that relation with Eq. (1.35) shows that the Gaussian packet presents the ultimate case in 
which the product SxSp = Sx{ti5k) has the lowest possible value (h/2); for any other envelope's shape the 
uncertainty product may only be larger. We could of course get the same result for 5k from Eq. (16) 
using definitions (1.23), (1.33), and (1.34); the real advantage of Eq. (24) is that it can be readily 
generalized to t > 0. 

Indeed, we already know that the time evolution of the wavefunction is given by Eq. (1.67), for 
our case giving 6 



Gaussian 
wave 
packet 
at arbitrary 
time 



J_ 

\2n j 



1 



(2nf\&k) 



.1/2 



exp^ 



(k-k 0 ) 2 



{ (2Sky 



the I M ,1 „ 
>e exp^-z t>ak. 

2m 



(2.26) 



Fig. 1 shows several snapshots of the real part of wavefunction (26), for a particular case 5k = 0.1 /Vq. 
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Fig. 2.1. Fime evolution of the 
wave ID wave packet evolution 
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time scales. Dashed lines show 
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1 1 

|f = 0 li 

1 — 


i i i i i i i i i i i 

[\ t = 3 T ' = 20 * 

-■■•vwwVill^^ — 






i i i i i i i i i i i 



Re^F 



10 o 



10 



20 30 



4!) 



50 



60 70 

x I Sx 



80 



90 100 110 120 130 



140 



Fhe plots clearly show the following effects: 

(i) the wave packet as a whole (as characterized by its envelope) moves along the x axis with a 
certain group velocity v gr , 



6 Note that this packet is equivalent to Eq. (16) and hence is properly normalized to 1 - see Eq. (3). Hence the 
wave packet introduction offers a natural solution to the problem of infinite wave normalization, which was 
mentioned in Sec. 1.2. 
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(ii) the "carrier" wave inside the packet moves with a different, phase velocity v p h, which may be 
defined as the velocity the spatial points where wave's phase (fix, t) = arg*? takes a certain fixed value 
(say, cp = kI2, where ReY vanishes), and 

(iii) the packet's spatial width gradually increases with time - the packet spreads. 

All these effects are common for waves of any physical nature. 7 Indeed, let us consider a ID 
wave packet of the type (26), 



V(x,t) = \a k e i(kx - m) dk.. 



Arbitrary 



propagating in a media with an arbitrary (but smooth!) dispersion relation co(k), and assume that the 
wave number distribution at is arbitrary but narrow: dk « (k) = ko - see Fig. 1.6b. Then we may expand 
function oik) into the Taylor series near the central point ko, and keep only two of its leading terms: 

(2.28) 



co(k) « co n + ^-k + — - % k 2 , where k =k-k n , co a = co(k n ), 
0 dk 2 dk 2 V ; 



and both derivatives are also evaluated at point k = ko. In this approximation, 8 the expression in 
parentheses in the right-hand part of Eq. (27) may be rewritten as 



kx-cot = k n x + kx ■ 



dco , 
co a + k + 



1 d 2 co ~ 2 



dk 2 dk 2 
so that Eq. (27) is reduced to integral 



t = (k 0 x - co 0 t)+ k 



dco 

x 1 

dk 



1 d 2 co 
I'dk 2 



k 2 t, (2.29) 



V{x,t) = e lik ° X -^ ) \a k exp\i 



V 



dco 

x 1 

dk 



1 d 2 co ,~ 2 



2 dk : 



k l t 



\dk . 



(2.30) 



First, let neglect the last term in square brackets (which is much smaller than the first term if the 
dispersion relation is smooth enough and/or the time interval t is sufficiently small), and compare the 
result with the initial form of wave packet (27): 



v F(x,0) = ^a k e lkx dk = e' k ° X ^a k e lkx dk 
The comparison shows that Eq. (30) is reduced to 

W(x, t) = ¥(x - v gr t, 0)/^"^° = *F( X - Vgr t, 0)e 
where v gr and v p h are two constants with the dimension of velocity: 



(2.31) 



(2.32) 



dco 




CO 








Vph s ~k 





Group 
and ph; 
velocities 



(2 33) anci P nase 



It is clear that Eq. (32) describes effects (i) and (ii) listed above. Let us calculate the group and 
phase velocities for the particular case of de Broglie waves whose dispersion law is given by Eq. (1.30): 



7 See, e.g., brief discussions in CM Sec. 5.3 and EM Sec. 7.2. 

8 By the way, in the particular case of de Broglie wave described by dispersion relation (1.30), Eq. (28) is exact, 
because a>= Elh is a quadratic function of k = p/fr, and all higher derivatives of a> over k vanish for any k 0 . 
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co = 



fik 2 
2m 



d (O 
dk 



_ fik 0 _ co 



k=k„ 



M 0 = Vg. 
2m 2 



(2.34) 



We see that (very fortunately!) the velocity of the wave packet envelope is constant and equals to that of 
the classical particle moving by inertia, in accordance with the correspondence principle. 

The remaining term in the square brackets of Eq. (30) describes effect (iii), the wave packet's 
spread. It may be readily evaluated if the packet (27) is initially Gaussian, as in our example (25): 



a k = const x exp< 



[23k) 2 



(2.34) 



In this case integral (30) is Gaussian, and may be worked out exactly as integral (20), i.e. merging the 
exponents under the integral, and presenting them as a full square of linear combination of x and k: 



(26k)' 



+ ik(x-v t) 



id co ~ 2 
-k 2 t 



2 dk' 



-A(0 



k +i — 

2A(0 



4A(0 



-^4* 0 a f, (2.35) 
2 dk 2 ° 



where I have introduced the following complex function of time: 



A(0- 



1 



■ + ■ 



i d co 



t = (Sx) 2 + 



i d co 



2 U 



4(Sk) 2 2 dk 2 2 dk' 

and have used Eq. (24) in the second equality. Now integrating over k , we get 



(2.36) 



¥(*,£) oc exp< 



(x-v s JY 
4A(0 



■ + i 



JcqX 



1 d 2 co 2 

2 dk j 



(2.37) 



The imaginary part of ratio \/A(f) in the exponent gives just an additional contribution to wave's phase, 
and does not affect the resulting probability distribution 



w(x, t) = x ¥ ¥ oc exp 



(2.38) 



2 A(0j ' 

This is again a Gaussian bell curve spread over axis x, centered to point (x) = v gT t, with the r.m.s. width 



M 2 



Re 



1 



A(0 



(Sxf 



+ 



1 d co 
2~dl? 



-t 



1 



(Sxy 



(2.39a) 



In the particular case of de Broglie waves, d co/dk = film, so that 



Wave 
packet's 
spread 
with time 



[Sx'Y = {Sxf 



+ 



\2m j 



1 



{Sxy 



(2.39b) 



The physics of the spreading is very simple: if d co/dk 2 ^ 0, the group velocity dooldk of each 
small group dk of monochromatic components of the wave packet is different, resulting in the gradual 
(eventually, linear) accumulation of the differences of the distances traveled by the groups. The most 
curious feature of Eq. (39) is that the packet width at t > 0 depends on its initial width Sx'(0) = Sx in a 
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non-monotonic way, tending to infinity at both 8x-^> 0 and dx — > qo. Because of that, for a fixed t, there 
is an optimal value of dx with minimizes dx ': 

( , \l/2 

(&'L„=V2(&) opt = - • (2.40) 

This expression may be used for spreading effect estimates. Due to the smallness of the Planck constant 
h on the human scale of things, for macroscopic bodies this effect is extremely small even for very long 
time intervals; however, for light particles it may be very noticeable: for the electron (m = m e « 10" 30 
kg), and t = 1 s, Eq. (40) yields {dx %;„ ~ 1 cm! 

Note also that for any t ^ 0, the wave packet retains its Gaussian envelope, but the ultimate 
relation (24) is not satisfied, Sx'Sp > h/2 - due to a gradually accumulated phase shift between the 
component monochromatic waves. The last remark on this topic: in quantum mechanics, the wave 
packet spreading is not an ubiquitous effect! For example, in Chapter 5 we will see that in a quantum 
oscillator, the spatial width of a Gaussian packet (for that system, called the Glauber state) does not 
grow monotonically but rather either stays constant or oscillates in time. 

Now let us briefly discuss the case when the initial wave packet is not Gaussian, but is described 
by an arbitrary initial wavefunction. In order to make the forthcoming result more appealing, it is 
beneficial to generalize out calculations to an arbitrary initial time to', it is evident that if U does not 
depend on time explicitly, it is sufficient to replace t with (t - to) in all above formulas. With this 
replacement, Eq. (27) becomes 

nx,t) = \a k e'\ kx - C ° (t - t ^dk, (2.41) 

and the reciprocal transform (21) reads 

a k = — [Wfatje^dx . (2.42) 
2k j 

If we want to express these two formulas with one relation, i.e. plug Eq. (42) into Eq. (41), we 
should give the integration variable x some other name, e.g., Xq. The result is 



x ¥(x,t) = — [dk[dx 0 x ¥(x 0 ,t 0 )e^ x ^ . (2.43) 

9 7T J J 



2k 

Changing the order of integration, this expression may be rewritten in the following general form: 



W(x, 0 = J G(x, t; x 0 ,t 0 ) ^(xo , t 0 )dx 0 , 



1D 



(2.44) propagator: 
definition 



where function G, usually called kernel in mathematics, in quantum mechanics is called the 
propagator. 9 According to Eq. (43), in our particular case of a free particle the propagator is equal to 



9 Its standard notation by letter G stems from the fact that the propagator is essentially the spatial Green 's function 
of the corresponding wave equation, very similar to the Green's functions of other ordinary and partial differential 
equations describing various physics systems - see, e.g., CM Sec. 4.1 and/or EM Sec. 2.7 and 7.3. 
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G(x,t;x 0 ,t 0 ) = — \e 
2n J 



[k[x-x 0 )-eo(t-t 0 )) 



dk , 



(2.45) 



The physical sense of the propagator may be understood by considering the following special 
initial conditions: 10 



x ¥(x 0 ,t 0 ) = S(x 0 -x'), 



(2.46) 



where x' is a certain point within the domain of particle's motion. In this particular case, Eq. (44) 
evidently gives 



= G(x,t;x',t 0 ) 



(2.47) 



Hence, the propagator, considered as a function of x and t only, is just the solution of the linear 
differential equation with functional initial conditions. Thus while Eq. (41) may be understood as a 
mathematical expression of the linear superposition principle in the momentum (i.e., reciprocal) space 
domain, Eq. (44) is an expression of this principle in the direct space domain: the system's "response" 
^(xj) to an arbitrary initial condition ^(xo^o) is just a sum of its responses to its thin spatial "slices", 
with propagator G(x,t; x 0 ,t 0 ) representing the weight of each slice in the final sum. 

Calculating integral (45), one should remember that a> is not a constant but a function of k, given 
by the dispersion relation for particular waves. In particular, for the de Broglie waves 



G(x,t;x 0 ,t 0 ) = -^-j"exp 



hk 
2m 



<t~t 0 ) 



>dk 



(2.48) 



This is a Gaussian integral again, and may be readily calculated just it was done (twice) above, by 
completing the exponent to the full square. The result is 



Free 
particle's 
propagator 



,1/2 



G{x, t, Xq , ?q ) — 



m 



2mft(t-t 0 ) j 



expi 



m(x-x 0 Y 



2ifi{t-t 0 ) 



(2.49) 



Please note the following features of this complex function (plotted in Fig. 2): 



Re j G(x,t;x Q ,t Q ) 
>J [m/h(t-t 0 )j 




(x - x 0 )/[h(t - t 0 )/m] l ' z 



Fig. 2.2. Real (solid line) 
and imaginary (dashed line) 
parts of the ID free 
particle's propagator. 



Note that this initial condition is not equivalent to a Afunctional initial probability density (2). 
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(i) It depends only on differences (x - xo) and (t - to). This is natural, because the free-particle 
propagation problem is uniform (translation-invariant) both in space and time. 

(ii) The function shape does not depend on its arguments - they just rescale the same function: 
its snapshot (Fig. 2), if plotted as a function of un-normalized x, just becomes broader and lower with 

1/2 

time. It is curious that the spatial broadening scales as (t - to) - just as at the classical diffusion, as a 
result of a deep analogy between quantum mechanics and classical statistics - to be discussed further in 
Chapter 7. 

(iii) In accordance with the uncertainty relation, the ultimately compressed wave packet (46) has 
an infinite width of momentum distribution, and the quasi-sinusoidal tails of the free-particle 
propagator, clearly visible in Fig. 2, are the results of the free propagation of the fastest (highest- 
momentum) components of that distribution, in both directions from the packet center. In the following 
sections, we will mostly focus on the spatial distribution of stationary, monochromatic wavefunctions 
(that, for unconfined motion, may be interpreted as wave packets of very large spatial width Sx), only 
rarely coming back to the wave packet discussion. Our excuse is the linear superposition principle, i.e. 
our conceptual ability to restore the general solution from that of monochromatic waves of all possible 
energies. However, the reader should not forget that, as the above discussion has illustrated, 
mathematically this restoration is not always trivial. 



2.3. Particle motion in simple potential profiles 

Now, let us proceed to the cases in which the potential energy U{x,t) is not identically equal to 
zero. The easiest case is that of spatially-uniform but time-dependent potential: U = U(t) = const. 
Indeed, the corresponding Schrodinger equation (1.25) with Hamiltonian 

H = |1 + U(t) = -f- V 2 + U(t) , (2.50) 
2m 2m 

allows the variable separation similar to that performed in Sec. (5), besides that the time-dependent 
probability amplitude a(t) obeys an equation of motion that is slightly more general than Eq. (59): 

ih— = [E-U(t)\i, (2.51) 
dt 

whose solution may be expressed as an evident generalization of Eq. (1.61): 



a{t) = a{0)e % r t + * v, K with w = - and ^ = _^w. (2.52) 

h dt h 

Looking at the basic relations (1.22) and (1.23) of wave mechanics, it seems that this additional 
phase factor does not affect the particle probability distribution, or even any observable (including 
energy it is referred to the instant value of LP), and hence the phase increment <p, associated with U(f), is 
just a mathematical artifact. This is certainly true for a single particle, however, the situation changes as 
soon as we recall that the Universe consists of more that one of them. 

For example, consider two similar, independent particles, each in the same (say, ground) 
eigenstate, but with the potential energies (and hence eigenenergies Eij) different by a constant AU = U\ 
- U2. Then, the difference q> = <p\ - (pi of their wavefunction phases evolves in time as 
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Quantum 
phase 
difference's 
evolution 

If the particles are in different worlds (or at least in different laboratories :-), this evolution is 
unobservable; however, it should be intuitively clear that a very weak coupling of a certain detector to 
each particle may allow it to observe phase (p, while keeping the particle dynamics virtually 
unperturbed, i.e. Eq. (53) intact. 

Perhaps the most dramatic demonstration of this phenomenon is the Josephson effect in 
superconductors. 11 Experimentally, the easiest way to observe the effect is by connecting two bulk 
superconductor samples with a weak, short electric contact (called either the weak link or the Josephson 
junction) and bias them with a constant (dc) voltage V, typically in a few-microvolt range - see Fig. 3. 



dip 


AU 


dt 


h 



(2.53) 




Fig. 2.3. Josephson effect in a weak link 
between two bulk superconductor electrodes. 



Superconductivity may be explained by a specific coupling between its conduction electrons, 
that leads, at low temperatures, to formation of the so-called Cooper pairs. Such pairs, each consisting 
of two electrons with opposite spins and momenta, behave as Bose particles, and form coherent Bose- 
Einstein condensate} 2 Most properties of such a condensate may be described by a single wavefunction, 
evolving in time as that of a free particle with the effective potential energy U = q<fi= -2e<fi, where <p is 
the electrochemical potential, 13 and q = -2e is the total charge of the Cooper pair. As a result, for the 
situation shown in Fig. 3, Eq. (53) takes the form 



dip 
dt 



- 2 ^v 
h 



(2.54) 



0SB effecC where V= <p\ - (j>i is the applied voltage. B. Josephson has predicted that, in a particular case when a 

basic wea i link i s a tunnel junction, electric current / of Cooper pairs through it should have a simple form: 14 
equations J era r 



1 = 1 



sin <p, 



(2.55) 



11 It was predicted theoretically by B. Josephson (then a graduate student!) in 1962 and observed experimentally 
in less than a year. More recently, analogs of this effect were also observed in superfluid helium and atomic Bose- 
Einstein condensates. 

12 See, e.g., SM Sec. 3.4. 

13 For more on this notion see, e.g. SM Sec. 6.4. 

14 Later, Eq. (55) has been shown to be valid for other weak link types as well, though deviations from have also 
been found. Fhese deviations, however, do not affect the fundamental 2 ^-periodicity of function I(cp) - see, e.g., 
EM Sec. 6.4. As a result, no deviations from the fundamental relations (56)-(57) have been found (yet :-). 
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where I c is some constant (scaling as the weak link strength). Combining Eqs. (53) and (54), we see that 
if the applied voltage is constant in time, the current oscillates with the so-called Josephson frequency 

fj=^, where (Dj^V, (2.56) 
lk n 

as high as ~ 484 MHz per each microvolt of applied dc voltage. This effect is now well documented, 
though a direct detection of the Josephson radiation is tricky; it is much easier to observe the phase 
locking (synchronization) 15 of the radiation by external microwave signal, which results in formation of 
nearly flat dc current steps at dc voltages 

V.=,£. (2.57) 
2e 

where co is the external signal frequency and n is an integer. 16 This effect is now being used in highly 
accurate standards of dc voltage. 17 

Now, let us move on to a discussion of the opposite case, when a ID particle modes in various 
potential profiles U(x) that are constant in time. Conceptually, the simplest of such profiles is a potential 
step - see Fig. 4. 



classically accessible 
E 



classically forbidden 




Fig. 2.4. Classical ID motion in a potential 
profile U(x). 



classical turning point 



As I am sure the reader knows, in classical mechanics, if a particle is incident on such a step (in 
Fig. 4, from the left), its kinetic energy p 2 /2m cannot be negative, so that it can only travel through the 
classically accessible region where its (conserved) full energy, 

2 

E = ^— + U(x), (2.58) 
2m 

is larger than the local value U(x). Let the initial velocity v = p/m be positive, i.e. directed toward the 
step. Before it has reached the classical turning point x c , defined by equation 

U(x c ) = E, (2.59) 



15 See, e.g., CM Sec. 4.4. 

16 If <yis not too high, this effect may be adequately described combining Eqs. (54)-(55). Let me leave this task 
for the reader. 

17 The most precise proof that the Josephson frequency-to-voltage ratio f 3 /V does not depend on superconducting 
material (to at least 15 decimal places!) has been carried out by the group led by J. Lukens here at Stony Brook - 
see J.-S. Tsai et al, Phys. Rev. Lett. 51, 316 (1983). 
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kinetic energy p 12m never turns to zero, so that the particle continues to move in the initial direction. 
On the other hand, the particle cannot penetrate that classically forbidden region x > x c , because there 
its kinetic energy would be negative there. At the point x = x c , particle's velocity changes sign, i.e. it is 
reflected back from the classical turning point. 

In order to see what the wave mechanics says about this situation, let us start from the simplest, 
sharp potential step shown with bold black lines in Fig. 5: 



U(x) = U o 0(x) = 



0, at x < 0, 
U 0 , at 0 < x. 



(2.60) 



For this choice, and any energy within the interval 0 < E < Uq, the classical turning point is x c = 0. 



U(x),E. 




Fig. 2.5. Reflection of a 
monochromatic wave from a potential 
step Uq > E. (This particular 
wavefunction's shape is for U 0 = 5E.) 
The wavefunction is plotted with the 
same schematic vertical offset by E, as 
those in Fig. 1.7. 



Let us represent an incident particle with a wave packet so long that the spread 5k ~ \ISx of its 
wave number spectrum, and hence the energy uncertainty 5E = hSco = h(dco/dk)dl<: is negligible in 
comparison with its average value E < Uq, as well as with (Uq - E). In this case, E may be considered a 
given constant, and the time dependence of the solution is given by Eq. (1.61), and we can limit 
ourselves to the solution of the ID version of the stationary Schrodinger equation (1.63), in this case 



d 2 



2m dx A 



+ U(x)y/ = Ey/ . 



(2.61) 



for the spatial part y/(x) of the wavefunction. 18 

At x < 0, i.e. at U = 0, the equation is reduced to the Helmholtz equation (1.75), and may be 
satisfied with two traveling waves, proportional to exp{+ikx} and exp{-ikx} correspondingly, with k 
satisfying the dispersion equation (1.30): 



k 2 = 



2mE 
IF 



(2.62) 



Thus the general solution of Eq. (61) in this region may be presented as 



18 Note that this is not the eigenproblem like the one we have solved in Sec. 1.4 for a quantum well. Indeed, now 
energy E is considered fixed - e.g., by the initial conditions that launch a long wave packet upon the potential 
step, from the left. 
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Ae +ikx + Be - 



ikx 



(2.63) 



Incident 
and 

reflected 
waves 



The second term in the right-hand part evidently describes an (infinitely long) wave packet traveling to 
the left, which represents particle's reflection from the potential step. If B = -A, this solution is reduced 
to Eq. (1.76) for the potential well with infinitely high walls, but as we will see in a minute, for our 
current case of finite step height Uo, the relation between coefficients B and A may be different. 

Uq > E. In this region the equation may 



To show this, let us solve Eq. (61) for x > 0, where U 
be rewritten as 



d 2 V+ 
dx 2 



K 2 y/ + , 



where k is a real constant defined by the relation similar to Eq. (62): 




(2.64) 



(2.65) 



The general solution of Eq. (64) is the sum of exp{+/cc} and exp{-Kx}, with arbitrary coefficients. 
However, the wavefunction should be finite at x — » co, so only the latter exponent is acceptable: 



y/ + {x) = Ce 



-ATV 



Decaying 
wave in 
classically 
forbidden 
region 



(2.66) 



This penetration of the wavefunction into the classically forbidden region, and hence a finite 
probability to find the particle there, is one of the most fascinating predictions of quantum mechanics, 
and has been repeatedly observed in experiment, e.g., via tunneling experiments - see below. From Eq. 
(66), it is evident that the constant k, defined by Eqs. (65), may be interpreted as the reciprocal 
penetration depth. Even for the lightest particles this depth is usually very small. Indeed, for E « Uo 
that equation yields 



S = 



1 



Pi 



K 



E=0 



{2mU 0 ) 



1/2 



(2.67) 



For example, for a conduction electron in a typical metal, that runs, at its surface, into a sharp potential 
step Uo, whose height equals to metal's workfunction W » 5 eV (see the discussion of the photoelectric 
effect in Sec. 1.1), 8 is close to 0.1 nm, i.e. is close to a typical size of an atom. For heavier elementary 
particles (e.g., protons) the penetration depth is correspondingly lower, and for macroscopic bodies it is 
hardly measurable. 

Returning to our problem, we still should find coefficients A, B, and C from the boundary 

2 2 

conditions at x = 0. Since E is a finite constant, and U(x) is a finite function, Eq. (61) says that d y/ldx 
should be finite as well. This means that the first derivative should be continuous: 



lim 



dy/ 1 dy/ 

X=+£ 



V 



dx 



dx 



X——£ 



= hm e^o]^ dx = Tr hm e^O \[U(x)-E] ¥ dx=0. (2.68) 



dx 



Repeating such calculation for function y/(x) itself, we see that it also should be continuous at all points, 
including x = 0, so that 
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¥ _(0) = ¥+ (0), ^(0) = ^(0). (2.69) 
ax ax 

Plugging solutions (63) and (66) into these two boundary conditions, we get a system of two linear 
equations 

A + B = C, ikA- ikB = -kC, (2.70) 
whose (elementary) solution enables us to express B and C via A : 

B = A^^, C = A^^. (2.71) 
k + i/c k + i/c 

We immediately see that since the nominator and denominator in the first of these formulas have 
equal moduli, so that \B\ = \A\. This means that, as we could expect, a particle with energy E < Uo is 
totally reflected from the step. As a result, at x < 0 our solution (63) may be presented by a standing 
wave 

•a k 
y/_ = 2iAe sin(kx - 0), with 0 = arctan— . (2.72) 

K 

Notice that the shift Ax = 61k = (arctan klx)lk of the standing wave to the right, due to the partial 
penetration of the wavefunction under the potential step, is commensurate with, but generally not equal 
to 8= \Ik. Figure 5 shows the full behavior of the wavefunction, for a particular case E = Uo/5, at which 
klx= [E/(U 0 -E)] m = 1/2. 

According to Eq. (65), as the particle's energy E is increased to approach Uo, the penetration 
depth XI k diverges. This raises an important issue: what happens at E > U 0 , i.e. if there is no classically 
forbidden region in the problem? Again, in classical mechanics the incident particle would continue to 
move to the right, though with a reduced velocity, corresponding to the new kinetic energy E - Uo, so 
there would be no reflection. In quantum mechanics, however, the situation is different. In order to 
analyze it, it is not necessary to re-solve the whole problem; it is sufficient to note that all our 
calculations, and hence Eqs. (71) are still valid if we take 19 

x = -ik', with k' 2 = 2m(E : Uo) >0. (2.73) 

h 2 

With this replacement, Eq. (71) becomes 20 

B = A^—^—, C = A^=^. (2.74) 
k + k' k + k' 

The most important result of this change is that now the reflection is not complete: \B\ < \A\. In 
order to evaluate this effect qualitatively, it is more fair to use not the B/A or CIA ratios, but rather that 



19 Our earlier discarding of the particular solution exp{/a}, now becoming exp{-ik'x}, is still valid, but now on a 
different grounds: this term would describe a wave packet incident on the potential step from the right, and this is 
not the problem under our consideration. 

20 These formulas are completely similar to those for the partial reflection of classical waves from a sharp 
interface between two uniform media, at normal incidence (see, e.g., CM Sec. 5.4 and EM Sec. 7.4), with the 
effective impedance Z of de Broglie waves proportional to their wave number k. 
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of the probability currents (5) corresponding to traveling waves with amplitudes C and A, in the 
corresponding regions (respectively, x > 0 andx < 0): 



rr _I c _k'\C\ 2 _ 4kk' 4[E(E-U Q )]' 2 


I A k\A\ 2 (k + kf 


E V2 +{E-Uj 12 


2 ■ 



(2.75) 



Potential 
step's 

transmission 



(T so defined is called the transparency of the inhomogeneity, in our current case of the potential step.) 
The result given by Eq. (75) is plotted in Fig. 6a. Notice its most important features: 

(i) At Uo = 0, the transparency is full, T= 1 - naturally, for having no step at all. 

(ii) At Uo — > E, the transparency tends to zero - giving a proper connection with the case E < Uo. 

(iii) We can use result (75) even for Uo < 0, i.e. for the step-down (or "cliff) profile - see Fig. 
6b. Very counter-intuitively, the particle is (partly) reflected even from such a cliff, and the transmission 
diminishes (rather slowly) at Uo 



-oo. 




E>0 
U = 0 



A ► 


c ► 













(b) 



Fig. 2.6. (a) Transparency of a potential step with U 0 < E 
as a function of its height, according to Eq. (75), and (b) 
the potential profile at U 0 < 0. 



UJE 



The most important conceptual conclusion of our analysis is that the quantum particle is partly 
reflected from a potential step with Uo < E, in the sense that there is a nonvanishing probability T < 1 to 
find it passed over the step, while there is also probability (1 - T) to have it reflected. 

The same property is exhibited, for any relation between E and Uo, by another simple potential 
profile U(x), the famous tunnel barrier. Figure 7 shows its simple, "rectangular" version: 



U(x) = 



0' 



for x<-d/2, 
for-d/2<x<+d/2, 
for + d 1 2 < x. 



(2.76) 



(7 = 0 



A 



B 



U=U n 



c 



D 



F 



■d/2 



+ d/2 



Fig. 2.7. Rectangular tunnel barrier. 
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In order to analyze this problem, it is sufficient to look for the solution to the Schrodinger 
equation in the form (63) at x < -d/2. At x > +d/2, i.e., behind the barrier, we may use the arguments 
presented above (no wave packet source on the right!) to keep just one traveling wave, 

¥+ {x) = Fe ikx . (2.77) 

However, under the barrier, i.e. at -d/2 <x< +d/2, we should generally keep both exponential terms, 



y/ h {x) = Ce ** +De 



(2.78) 



because our previous argument, used in the potential step problem's solution, is no longer valid. (Here k 
and k are still defined, respectively, by Eqs. (62) and (65).) In order to find the relation between 
coefficients A, B, C, D, and F, we need to plug in the solutions into the boundary conditions similar to 
Eqs. (69), but now at two boundary points, x = ± d/2. 

Solving the resulting system of 4 linear equations for five amplitudes (A, B, C, D, and F), we can 
readily calculate four ratios BIA, CIA, etc., in particular, 



F 
~A 



Qxp{-ikd] 



cosh Kd + 



K 



(2.79a) 



k k , 



sinh/oi 



and hence barrier's transparency 



Rectangular 
tunnel 
barrier's 
transparency 



T = 



F 


2 




~A 





cosh icd + 



2/dc 



sinh Kd 



(2.79b) 



Figure 8a shows the transparency as a function of particle energy E, for several characteristic 
values of the barrier thickness d, or rather of the ratio d/8, where c5>is defined by Eq. (67). 
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Fig. 2.8. Fransparency of the rectangular tunnel barrier as a function of particle's energy E. 
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The plots show that for a thin barrier (d < 5) the transparency grows gradually with particle's 
energy. This growth is natural, because the penetration constant k decreases with the growth of E, i.e., 
the wavefunction penetrates more and more into the barrier, so that more and more of it is "picked up" 
at the second interface (x = +d/2) and transferred into the wave Fexp{ikx] propagating behind the 
barrier. As Eq. (79b) shows, for thick barriers (d » 5) , this dependence is dominated by an exponent, 



T ~ 


f AkK ^ 


2 

-2ml 

e 




yk 2 +K 2 j 





(2.80) 



Thick 
tunnel 
barrier's 
transparency 



that may be clearly seen as a straight segments in semi-log plots (Fig. 8b) of T as a function of the 

1/2 

combination (1 - E/Uq) which is proportional to k - see Eq. (65). 

Equation (80) also clearly shows the exponential dependence of the barrier transparency of its 
thickness at d » 8. This dependence is the most important factor for various applications of the 
quantum-mechanical tunneling - from the field emission 21 of electrons to scanning tunneling 
microscopy. 22 Also noted should be substantial negative implications of the effect for modern electronic 
engineering, most importantly imposing a limit for scaling down of field effect transistors in 
semiconductor integrated circuits (and hence the circuit density increase according to the well-known 
Moore's law), due to increase of tunneling both through the gate oxide and along transistor's channel. 23 

Another interesting effect visible in Fig. 8a (for case d = 0.3 c5) are the oscillations of T at E > Uo. 
This is our first glimpse at one more interesting quantum effect: resonant tunneling. I will discuss this 
effect in detail in Sec. 5 below. 



2.4. The WKB approximation 

Before moving on to exploring more complex potentials, let us see whether the results discussed 
in the previous section hold on in the opposite limit of so-called soft, gradual potential profiles, like that 
sketched in Fig. 4. (The quantitative conditions of the "softness" will be derived below). The most 
efficient analytical tool in this limit is the WKB (or "quasiclassical") approximation developed by H. 
Jeffrey, G. Wentzel, A. Kramers, and L. Brillouin in 1926-27. 

In order to derive its ID version, let us rewrite the Schrodinger equation (61) as 

^- + k 2 (x)y/ = 0 (2.81) 
dx 

where the local value of wave number k(x) is defined similarly to Eq. (73), 

t , ^-pw]. 



but now it may be a function of x. We already know that for k(x) = const, the fundamental solutions of 
this equation have form Aexp{+ikx} and Bexp{-ikx} . Any of them may be presented in a simple form 



Local 

(2.82) wave 
number 



21 See, e.g., G. Fursey, Field Emission in Vacuum Microelectronics, Kluwer, New York, 2005. 

22 See, e.g., G. Binning and H. Rohrer, Helv. Phys. Acta 55, 726 (1982). 

23 See, e.g., V. Sverdlov et ah, IEEE Trans, on Electron Devices 50, 1926 (2003). 
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y/{x) = e , 



(2.83) 



where 0(x) is a complex function, in this simplest case equal to either (kx - ilnA) or (-kx - zlnS). This is 
why we may try use Eq. (83) to look for solution of Eq. (81) even in the general case, k{x) ^ const. 
Differentiating Eq. (83) twice, we get 



dy/ _ . JO ,cd d y/ 

dx dx dx 



d 2 ® 

dx 2 



dO 

V dx j 



jo 



(2.84) 



Plugging the last expression into Eq. (81) and requiring the factor before exp{zO(x)} to vanish, we get 

.d 2 o 



' 2 - U^ 1 



dx z 



v dx j 



+ £ z (x) = 0. 



(2.85) 



This is still an exact, general result. At the first sight, it looks worse than the initial equation 
(81), because Eq. (85) is nonlinear. However, it is more ready for simplification in the limit when the 
potential profile is very smooth, dUldx — > 0. Indeed, we know that for a uniform potential, <D" = 0. 



Hence, in the "0 th " approximation, 0(x) — > O 0 (x), we may try to keep that result, so that Eq. (85) yields 



dx 



= k 2 (x). 



Just as in the uniform case, this equation has two roots, 

JO 



dx 



- = ±k(x), 



(2.86a) 



(2.86b) 



so that its general solution is 

y/ Q (x) = Aexp 



+ 



X A 

i^k(x')dx'> + BQxp< -i^k(x')dx' 



(2.87) 



where x ' is the lower limits of integration affect only constants A and B. The physical sense of this result 
is simple: it is a sum of forward- and back-propagating waves, with the coordinate-dependent local wave 
number k{x) that self-adjusts to the potential profile. 

Let me emphasize the non-trivial nature of this approximation. 24 First, any attempt to address the 
problem with a standard perturbation approach (say, y/ = y/o+ y/\+..., with y/ n proportional to n th power 
of some small parameter, 25 in this case scaling d 2 U/d 2 x) would fail for most potentials, because even a 
slight but persisting deviation of U(x) from a constant leads to a gradual accumulation of phase Oo, 
impossible to describe by any small perturbation of y/. Second, the dropping of term d 2 <£>/dx 2 in Eq. (85) 
is not too easy to justify. Indeed, since we are committed to the "soft potential limit" dU/dx — > 0, we 
should be ready to assume the characteristic length a of spatial variation of O to be large, and neglect 



24 Philosophically, this space-domain method is very close to the time-domain rotating wave approximation 
(RWA) used, for example, in the classical and quantum theory of oscillations - see, e.g., CM Sees. 4.2-4.5, and 
Sees. 6.5, 7.6, 7.7, 9.2, and 9.4 of this course. 

25 Such perturbation theories will be discussed in Chapter 6. 
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the terms that are the smallest ones in the limit a — > oo. However, both first terms in Eq. (85) are 
apparently of the same order in a, namely 0(a ); why have we neglected just one of them? 

The price we have paid for such a "sloppy" treatment is high: Eq. (87) does not satisfy the 
fundamental property of the Schrodinger equation, the probability current conservation. Indeed, since 
Eq. (81) describes a fixed-energy (stationary) spatial part of the general Schrodinger equation, its 
probability density w = =ynf/*, and should not depend on time. Hence, according to Eq. (6), we 
should have I(x) = const. However, this is not true for each component of Eq. (87); for example for the 
forward-propagating component of its right-hand part, Eq. (5) yields 



/„(*) = — \A\ 2 k(x), 
m 



(2.88) 



evidently not a constant if k(x) ^ const. 



The brilliance of the WKB theory is that the problem may be fixed without revising the 0 
approximation. Indeed, let us explore the next, 1 st approximation instead: 



th 



-> Ol WKfl (x) = O 0 (x) + O ! (x) ., 



(2.89) 



where 0o still obeys Eq. (85), while 0i describes a small correction to the 0 th approximation, in the 
following sense: 26 



J0j 



dx 



« 



d®, 



dx 



= k{x) . 



(2.90) 



Plugging Eq. (89) into Eq. (85), with the account of the definition (86), we get 



dx 2 



■ + ■ 



d 2 ®, 



dx z 



dx 



<f0„ J0, 



dx dx 



(2.91) 



Using condition (90), we may neglect d 2 0\/dx 2 in comparison with d 2 <&o/dx 2 in the first parenthesis, and 
dO\/dx in comparison with 2dO 0 /dx in the second parenthesis. As a result, we get the following 
approximate result: 



flf0, 

dx 



i d O 0 ^d® 



2 dx 2 dx 

Z0| 



1 d 

2 dx 



v 



dx 



i d 



-[lnA:(x)] = /^-[ln^ 1/2 (x)], 
2 dx dx 



x 1 

;/0 o +/0j =±i\ k(x')dx' + ln—r-z 

J k (x) 



W\ 



(x) = pf— expj ijk(x')dx' j> + ^^expj - ijk(x')dx' [•, for k 2 (x) > 0. 



k uz (x) 



(2.92) 
(2.93) 

(2.94) 



WKB 
wave- 
function 



(Again, the lower integration limit is arbitrary, but its choice may be incorporated into complex 
constants a and b.) This modification of the 0 th approximation (87) overcomes the problem of current 
continuity; for example, for the forward-propagating wave, Eq. (5) gives 



26 For certainty, I will use the discretion given by Eq. (82) to define k(x) as the positive root of its right-hand part. 
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WKB 
probability 
current 



WKB: 
first 
condition 
of validity 



/ |wkb( x ) = —H 2 = const. 
m 



(2.95) 



1/2 

Physically, factor k in the denominator of the WKB wavefunction's pre-exponent is easy to 
understand. The smaller the local group velocity (34) of the wave packet, v gr (x) = hk(x)/m, the "easier" 
(more probable) it should be to find the particle within a certain interval dx. This is exactly the result 
that WKB gives: dWIdx = w(x) = y/y/* <x \/k(x) oc l/v gr . 

Another value of the 1 st approximation is a clarification of WKB theory's validity condition: it is 
given by Eq. (90). Plugging into this relation the first form of Eq. (92), and estimating |cp 0 "| as |O 0 '|/a, 
where a is the spatial scale of a substantial change of |O 0 '| = k{x), we can rewrite the condition as 



ka»\. (2.96) 

In plain English, this means that the region where U(x), and hence k(x), change substantially should 
contain many de Broglie wavelengths X = 2nlk. 

So far I have implied that k (x) oc E - U(x) is positive, i.e. particle moves in the classically 
accessible region. Now let us extend the WKB approximation to the situation where the difference E - 
U(x) may change sign, for example to the reflection problem sketched in Fig. 4. Just as we did for the 
sharp potential step, we first need to find the appropriate solution for the classically forbidden region, in 
this case x > x c . For that, there is no need to redo our calculations, because they are still valid if we, just 
as in the sharp step problem, take k(x) = itc(x), where 

y2(x)s 2m[£/(*)-2?] >()> {oTX>x ^ (29?) 
h 

and keep just one of two possible solutions (with k> 0), in analogy with Eq. (66). The result is 

w\ wkb ( x ) = — i/2 — ex pj _ \^{x')dx'\, fork 1 < 0, i.e./c 2 > 0, (2.98) 

K {x) 

with the lower limit at some point with > 0 as well. This is a really wonderful formula! It describes 
the quantum-mechanical penetration of the particle into the classically forbidden region, and provides a 
natural generalization of Eq. (66) - leaving intact, of course, our estimates of the depth S~ \Ik of such 
penetration. 

Now we have to do what we have done for the sharp-step problem in Sec. 2: use the boundary 
conditions in the interface point x = x c to relate constants a, b, and c. However, now this operation is a 
tad more complex, because both WKB functions (94) and (98) diverge, albeit weakly, at the classical 
turning point, were both k{x) and k(x) tend to zero. This connection problem may be however, solved in 
the following way. 27 Let us use the commitment of potential "softness", assuming that it allows us to 
keep just two leading terms in the Taylor expansion of function U(x) at point x c : 



dU 



dx 



dU 



U(x)*U(x c ) + — x __ x (x-x c ) = E + — x __ x (x-x c ). (2.99) 



dx 



27 An alternative way to solve the connection problem, without involving the Airy functions but using an 
analytical extension of WKB formulas to the plane of complex argument, may be found, e.g., in Sec. 47 of 
textbook by L. Landau and E. Lifshitz, Quantum Mechanics, Non-Relativistic Theory, 3rd ed. Pergamon, 1977. 
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Using this truncated expansion, and introducing a dimensionless variable for coordinate's deviation 
from the classical turning point, 



x-x„ 



h- 



,1/3 



2m(dU I dx) 



we reduce the Schrodinger equation (61) to the simple Airy equation 




(2.100) 



(2.101) equation 



As for all linear, ordinary differential equations of the second order, the general solution of Eq. (101) 
may be presented as a linear combination of two fundamental solutions, in this case called Airy 
functions Ai(^) and Bi(^), shown in Fig. 9a. 
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i Ai| wkb(^) 







10 



10 



Fig. 2.9. (a) Airy functions Ai and Bi, and (b) the WKB approximation for function Ai(<0. 



The latter function diverges at t, — » qo, and thus is not suitable for our current problem (Fig. 4), 
while the former function has the following asymptotic behaviors at \Q \ » 1: 28 



Ai(0 -> 



x s 



1 



-exp^ 



3/2 



sin- 



3/2 . 1t_ 

4 



for C, — > +oo, 
for ^ 



(2.102) 



-oo. 



Now let us apply the WKB approximation to the Airy equation (101). Taking the classical 
turning point (<^= 0) for the lower limit, for Q > 0 we get (in dimensionless units) 



28 The following (exact!) integral formulas, 

1 00 ( \ 
Ai(0 = -fcos V + ^ 



1 x 

Bi(0 = -f 



r 



+ g sin 



are often convenient for practical calculation of Airy functions at intermediate values of the argument, |£j ~ 1. 
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(2.103) 



i.e. exactly the exponent in the first line of Eq. (102). Making a similar calculation for Q< 0, with the 
natural assumption \b\ = \a\ (full reflection from the potential step), we arrive at the following result: 



Ai wkb {£) — 



1/4 



x < 




3/2 



for £ > 0, 



(2.104) 



asin<j — (-^) 3/2 +<Ph for^<0. 



This approximation differs from the exact solution at small values of Q , i.e. close to the classical 
turning point - see Fig. 9b. However, at |£j » 1, Eqs. (104) describe the Airy function exactly if 

(p = — and c = — . (2.105) 



Hence we can use these connection formulas to express the relations between coefficients a, b, and c of 
the general WKB solutions (94) and (98). In particular, the first of them yields b = -a exp{zVr/2}, so that 
Eq. (94) becomes 



WKB 



(x<x c ) = 



exp< + 1 



]k(x')dA 



exp 



(2.106) 



This result may be also described by a simple mnemonic rule: reflecting from a "soft" potential step, the 
wavefunction acquires an additional phase shift Ap = nil, if compared with the reflection from a "hard" 
(vertical) potential wall located at x = x c , for which, according to Eq. (1 .76), we would have b = -a. 

Let us quantify the condition of validity of the connection formulas (105) - in other words, the 
criterion of the step "softness". For that, within the region where the WKB approximation differs from 
for the exact Airy equation (|^| ~ 1, i.e. \x - x c \ ~ Xq), the deviation from the linear approximation (99) of 
the potential profile should be relatively small. This deviation may be estimated using the next term of 
the Taylor expansion, d U/d 2 x\ x = xc (x - x c ) 2 /2. As a result, the softness condition may be expressed as 
I dU/dx | x = xc . With the account of Eq. (100) for xo, the condition becomes 



d 2 U/d 2 x | x = xc xo « 



WKB: 
second 
condition 
of validity 



d 2 U 



dx z 



« 



2m 



dU 
dx 



(2.107) 



As an example of a very useful application of the WKB approximation, let us use it to calculate 
the energy spectrum of ID particle in a soft ID quantum well (Fig. 10). As was discussed above, we 
may always consider the standing wave describing an eigenstate y/ n (corresponding to eigenenergy E n ) 
as a traveling wave going back and forth between the walls, being sequentially reflected by each of 
them. Let us apply the WKB approximation to such a traveling wave. First, according to Eq. (94), 
propagating from the left classical turning point xl to the right point xr, it acquires phase change 



K 

Acp^ = ^k{x)dx 



(2.108) 
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At the reflection from the soft wall at xr, according to the connection formula (106), the wave 
acquires an additional shift nil. Now, traveling back from xr to xl the wave gets a shift similar to one 
given by Eq. (108): A^<_ = Acp^. Finally, at the reflection from xl it gets one more nil. Summing up all 
these contributions, we may write the self-consistency condition (that the wavefunction "catches its own 
tail with its teeth"), in the form 



total 



A<p^ + — + A^_ + — = l\k{x)dx + n = 2m, with n = 1, 2,... (2.109) 



Rewriting this result in terms of particle's momentum p(x) = hk(x), we arrive at the famous ID Bohr- 
Sommerfeld quantization rule 
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where the closed path C means the full period of classical motion. 29 




Fig. 2.10. Quasiclassical treatment of eigenstates in a 
soft ID potential well. 



Let us see what does this rule give for the very important particular case of a quadratic potential 
profile of a harmonic oscillator of frequency «d. In this case, 



m 
~2 



U{x) = ^-C0qX 2 , 



and the classical turning points are the roots of a simple equation 

— a> 0 x c =E n , 



(2.111) 



(2.112) 



1/2 

so that xr = x c = (lE n lm) lm> 0, xl = - x c < 0. Due to potential's symmetry, the integration required by 
Eq. (1 10) is also simple: 



\p{x)dx= \{2m[E n -U(x)]} U2 dx = (2mE n ) 112 j 



+x,Y 2 \ 1/2 

V X c J 



dx = (2mEj l2 x c ^- = ^^- 
2 a> 0 2 



(2.113) 



29 Note that at motion in more than one dimension, a closed classical trajectory may have no turning points. In this 
case, the constant V2 in the parentheses of Eq. (109), arising from the turns, should be dropped. Fhe simplest 
example is the circular motion of the electron about the proton in Bohr's picture of the hydrogen atom, for which 
the modified quantization (109) condition takes form (1.10) postulated by N. Bohr. (A similar relation for the 
radial motion is sometimes called the Sommerfeld- Wilson quantization rule.) 
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so that Eq. (1 10) is satisfied if 



Harmonic 
oscillator's 
energy 
levels 




with n' = n-l = 0,1,2, 



(2.114) 



In order to estimate the validity of this result, we have to check condition (96) at all points of the 
classically allowed region, and Eq. (107) at the turning points. A straightforward calculation shows that 
both conditions are valid for n » 1. However, we will see below that Eq. (114) is actually exactly 
correct for all energy levels — thanks to special properties of potential profile (1 1 1). 

Now, let us look at the second of connection formulas (105), c = all. Again, it differs from the 
result (71) for a sharp potential step, that may be rewritten as 



C = A 



2k 



A 



k + ix [l + (/r/£) 2 ] 



— exp{-/20}, 



(2.115) 



by both the modulus and phase factor. (In the WKB approximation, the latter factor always equals nlA.) 
Hence, again, the WKB approximation's prediction is not exact for sharp potentials; nevertheless, it is 
broadly used for practical calculations. One of the most important of them is the transparency of an 
arbitrary but smooth potential barrier (Fig. 1 1). 




Fig. 2.1 1. ID potential barrier of 
^ an arbitrary (but smooth) shape. 



Here, just as in the case of a rectangular barrier, we need to take unto consideration five 
particular "waves" (or rather fundamental solutions): 30 



WKB 



a 



expj ijk(x')dx' I + —J^ — expj - i J k(x')dx' L for x < x c , 



k m (x) 
c 



k l,z (x) 



-exp 



-\K{x')dA 



+ - 



exp 



^rc(x')dx'>, toxx c <x<x c ', (2.116) 



/ 



k m (x) 



Q~vp\i^k{x')dx'\, forx f '<x, 



where lower limits of integrals are arbitrary (each within the corresponding range of x). Since on the 
right of the left classical point we have two exponents rather than one, and on the right of the second 



30 Sorry, but the same letter, d, is used here for the barrier thickness (defined in this case as the classically 
forbidden region length, x c ' - x c ), and the constant in one of the wave amplitudes - see Eq. (1 16). Let me hope 
that the difference between these uses is absolutely evident from the context. 
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point, one traveling waves rather than two, the connection formulas (105) have to be generalized, using 
asymptotic formulas not only for Ai(^), but also for the second Airy function, Bi(^). The analysis, 
absolutely similar to that carried out above (though naturally a bit more bulky), 31 gives a remarkably 
simple result: 
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' *; 
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(2.117) 



Soft 
tunnel 
barrier's 
transparency 



with no pre-exponential factor. This formula is broadly used in applied quantum mechanics, despite the 
approximate character of its pre-exponential coefficient for insufficiently soft barriers that do not satisfy 
Eq. (107). For example, Eq. (80) shows that for a thick rectangular barrier with k= k, i.e. Uq = 2E, the 
WKB approximation (117) underestimates Thy a factor of 4. However, on the logarithmic scale of Fig. 
8b, such factor, about half an order of magnitude, still looks as a small correction. 

Notice that when E approaches the barrier top U max (Fig. 11), points x c and x c ' merge, so that, 
according to Eq. (117), T — > 1, i.e. the particle reflection vanishes at E = £/ max . However, this 
conclusion is incorrect even for smooth barriers where one could naively expect the WKB 
approximation to work perfectly. Indeed, near point x = x m where the potential reaches maximum (i.e. 
U(x m ) = U ma]l ), we may always approximate a smooth function U{x) by an inverted parabola, 



U(x)*U 0 



mcol(x-x m ) 2 



(2.118) 



2 2 

Calculating dU/dx and d U/dx of this function and plugging it into condition (107), we see that the 
WKB approximation is only valid if |{y max - E\ » hcoo. An exact analysis 32 of tunneling through barrier 
(118) gives the following Kemble formula: 

T = - (2 119) 

l + exp{-2;^(£-l/ max )/;^ 0 } , | 

valid for any sign of difference (E - U max ). This formula describes a gradual approach of T to 1, i.e. a 
gradual reduction of reflection at particle energy's increase, with T = l A (rather than 1) at E = U max . 

Now the last remark of this section: our discussions of the propagator and the WKB 
approximation open a straight way toward an alternative formulation of quantum mechanics, based on 
the Feynman path integral, but I will postpone its discussion until a more compact ("bra-ket") notation 
has been introduced in Chapter 4. 



Kemble 
formula 



2.5. Transfer matrix, resonant tunneling, and metastable states 

Let us now explore motion in more complex potential profiles. The piecewise-constant and 
smooth-potential models of U(x) are not too convenient here, because they both require "stitching" local 



31 It may be found, for example, in Sec. 7.4 of textbook by E. Merzbacher, Quantum Mechanics, 3 r ed., Wiley, 
1998. 

32 It was carried out by E. Kemble in 1935. Notice that mathematically the Kemble formula is similar to the Fermi 
distribution in statistical physics, with effective temperature T ef = hcoollnk^. This similarity has some interesting 
implications for the statistics of Fermi gas tunneling. 
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solutions in each classical turning point, which may lead to very cumbersome calculations. However, we 
may get a very good insight of the physics phenomena in such profiles, using their approximation by a 
set of Dirac's delta- functions. For that, let us have a look at what our old result (79) gives in the limit of 
a very thin and high rectangular barrier, d« S,E « Uo (giving k « k« lid): 



T = 



1 



1 



i i2 1 2 ' 

\l + ia l + « 



(2.120) 



where parameter a is defined as 



a = 



Kk 



J 



l K 2 d m 

W « « — — Um 

2k h 2 k 0 



The last product, Uod, is just the "area" 



W - \u(x)dx 



U(x)>E 



(2.121) 



(2.122) 



of the barrier. This fact implies that the very simple result (120) for the transparency may be correct for 
a barrier of any shape, provided that it is sufficiently thin and high. 

Indeed, let us consider the tunneling problem for a very thin barrier with red, kd « l (Fig. 12), 
approximating it by Dirac's 5- function: 

U(x) = W8{x). (2.123) 



U(x) = WS(x) 
A 



B 



F 



Fig. 2.12. Delta-functional tunnel barrier. 



We already know the solutions in all points but x = 0 - see Eqs. (63) and (77) - so we only need 
to analyze boundary conditions in that point to find coefficients A, B, and F - or rather the ratios B/A and 
FIA. However, due to the special character of the ^-function, we should be careful here. Indeed, instead 
of Eq. (68) we now get 



^L(0)-^(0)= lim„„ = Um„ 0 £ / [!/(*)-%* = f>„(0). (2.124) 

On the other hand, the wavefunction itself is still continuous: 

¥+ {0)- ¥ {0) = lim £ .^ 0 \^-dx = 0. (2.125) 

-£ 

Using these boundary conditions, we readily get the following system of two linear equations, 
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2m W 

A + B = F, ikF -{ikA-ikB) = — —F, 

h 



whose solution yields 



B 

~A 



1a 



l + ia 



F_ 
A 



1 



1 + ia 



where a 



mW 
~¥k' 



(2.126) 



(2.127) 



For the barrier transparency T= \F/A\ 2 , this result again gives Eq. (120). That formula may be recast to 
give a simple expression (valid only for E « t/ max ) for the transmission coefficient, 



1 



l + a 2 E + E n 



where E,, 



mW 2 
2fi 2 



Thin 



(2.128) barrier's 

transparency 



that shows that as energy becomes larger than parameter E 0 , the barrier's transparency approaches unity. 

However, the most important application of Eqs. (126) is for deriving transparency of more 
complex potential profiles. For that, let us first introduce very general notions of the scattering and 
transfer matrices, currently for the ID case. Consider an arbitrary but finite-length potential "bump" 
(more formally called a scatterer), localized somewhere between points x\ and x%, on the flat potential 
background, say U = 0 (Fig. 13). We know the general solution, with a certain energy E, outside the 
interval are a set of two sinusoidal waves. Let us present them in the form 

ikix—Xj) n —ik(x-Xj) 



+ Bje 



(2.129) 



where (for now) j = 1 or 2, and {hk) 12m = E. Note that each of the wave pairs (129) has, in this notation, 
its own reference point Xj, because this is very convenient for the calculations which follow. 



U (x) A 




Fig. 2.13. A single ID scatterer. 



As we have already discussed, if the wave/particle is incident from the left, the linear 
Schrodinger equation within the scatterer range (xi < x < xj), can provide only linear expressions of the 
transmitted (A 2 ) and reflected (Bi) wave amplitudes via the incident wave amplitude A i : 



^2 — ^21^1' ~ ^11^1' 



(2.130) 



where 5*1 1 and S21 are certain (generally, complex) coefficients. In this case, B2 = 0. Alternatively, if a 
wave, with amplitude B2, is incident from the right, it also may induce a transmitted wave (#i) and 
reflected wave (A2) with amplitudes 



B, 



S l2 B 2 , A 2 



S 22 B 2 , 



(2.131) 



where coefficients 5*22 and Sn are generally different from Sn and 5*21. Now we can use the linear 
superposition principle to argue that if waves A 1 and B2 are simultaneously incident on the scatterer (say, 
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because wave B 2 has been partly reflected back by some other scatterer located at x > x 2 ), the resulting 
scattered wave amplitudes A 2 and B\ are just the sums of their values for separate incident waves: 



Scattering 
matrix: 
definition 



B l — S n A x + S l2 B 2 , 
A 2 — S 2X A X ~\~ S ' 22 B 2 . 



(2.132) 



These linear relations may be conveniently presented by the so-called scattering matrix (frequently 
called just "S-matrix"): 













°12 




= s 






\ A 2J 




\ B 2J 






^22 y 



(2.133) 



Scattering matrices, duly generalized, are an important tool for the analysis of wave scattering in 
more than one dimensions; for ID problems, however, another matrix is more convenient to present the 
same linear relations (132). Indeed, let us solve this system for A 2 and B 2 . The result is 



Transfer 
matrix: 
definition 



A 2 


= TuA 


+ T U B l> 


i.e. 




= T 




9 


B 2 


= T 2X A X 


+ T 22 B X , 




K B 2J 









(2.134) 



where T is the transfer matrix with elements 

rp C ^11 "22 



'21 



T - ^2 

12 "5 



T = 

1 2\ 



12 



5 



-22 



(2.135) 



21 



One can wonder whether matrices S and T obey any universal properties that would be valid for 
an arbitrary (but time-independent) scatterer. Such universal equations may be readily found from the 
probability current conservation and the time-reversal symmetry of the Schrodinger equation. Let me 
leave finding these relations for reader's exercise. The results show, in particular, that the scattering 
matrix may be rewritten in the following form: 



re 



iq> 



re 



-i<p 



(2.136a) 



where 4 real parameters r, t, 6, and q> satisfy just one universal relation: 



r 2 +t 2 =1 (2.136b) 

(so that only 3 of the parameters are independent). As a result of this symmetry, T\\ may be also 
presented in a simpler form, similar to r 22 : T n = Qxp{i0}/t = l/S i2 = I/S21 • The last form allows a ready 
expression of scatterer's transparency via just one coefficient of the transfer matrix: 

2 



T = 



- ^21 — K11 



(2.137) 



5 2 =0 



In our current context, the most important property of ID transfer matrices is that in order to find 
the total transfer matrix T of a system consisting of several (say, AO sequential arbitrary scatterers (Fig. 
14), it is sufficient to multiply their matrices. Indeed, extending the definition (134) to other points Xj (j 
= 1,2, ...,N+ 1), we can write 
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= T, 






= T 

l 2 




= T T 








V B \J 


y B U 




\ B 2j 




v B J 



(2.138) 



etc. (where the matrix indices indicate the scatterers' order on axis x), so that 



A 

\ B N+\ J 



T T T 

l N 1 N-l — 1 l 



\ B U 



(2.139) 



A, 



B, 



A, 



B h 



A A 



X N+\ X 



Fig. 2.14. A sequence of several ID 
scatterers. 



But we can also define the total transfer matrix similarly to Eq. (134), i.e. as 



A 

J 



\ B \j 



so that finally 



T = T T 



(2.140) 



Transfer 
matrix of a 
(2.141) composite 
scatterer 



This formula is valid even if the flat-potential gaps between component scatterers vanish, so that 
it may be applied to a scatterer with an arbitrary profile U(x), by fragmenting its length into small 
segments Ax = Xj+\ - Xj, and treating each fragment as a rectangular barrier of height (Uj) e f = \U(Xj+\) — 
U(xj)]/2 - see Fig. 15. Since very efficient numerical algorithms are readily available for fast 
multiplication of matrices (especially as small as 2x2), this approach is broadly used in practice for the 
computation of transparency of tunnel barriers with complicated profiles U(x). (This is much more 
efficient then the direct numerical solution of the Schrodinger equation.) 



U(x) 




Fig. 2.15. The transfer matrix approach 
to a long tunnel barrier of an arbitrary 
profile. 



In order to use this approach for several conceptually important systems, let us calculate the 
transfer matrices for a few elementary scatterers, starting from the delta- functional barrier located at x = 
0. Taking x\ = X2 = 0, we can merely change the notation of wave amplitudes in Eq. (127) to get 
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ia 



s 2l =- 



1 



1 + ia 1 + ia 

An absolutely similar analysis of the wave incidence from the left yields 

- ia 1 



'22 



1 + ia 



1 + ia 



and using Eqs. (135), we get 



Transfer 
matrix of a 
short 
scatterer 




(2.142a) 



(2.142b) 



(2.143) 



The next example may seem strange at the first glance: what if there is no scatterer at all between 
points xi and x{l If points X\ and %2 coincide, the answer is indeed trivial and can be obtained, e.g., from 
Eq. (143) by taking W = 0, i.e. a = 0: 



Identity 
matrix 





(\ 0^ 




T = 




= 1 


v 0 \j 





(2.144) 



- the so-called identity matrix. However, we are free to choose the reference points x\p. participating in 
Eq. (129) as we wish. For example, what if xi - x\ = al Let us first take the forward-propagating wave 
alone: B2 = 0 (and hence B\ = 0); then 



¥2 = Wi = A e 



ik(x—x l ) ^ ik(x 2 —x l ) ik(x—x 2 ) 



(2.145) 



Transfer 
matrix 
of a space 
interval 




Comparison of this expression with the definition (129) for j = 2 shows that A2 =A\ exp{z'A:(x2 - xi)} =A\ 
Qxp{ika}, i.e. T n = Qxp{ika}. Repeating the calculation for the back-propagating wave, we see that T 2 2 = 
exp{-z'A:a}, and since this "no-potential" (space interval) provides no particle reflection, we finally get 



(2.146) 



independently of the mutual position of points xi and x%. At a = 0, we naturally recover the special case 
(143). 

Now let us use these results to analyze the double-barrier system shown in Fig. 16. We could of 
course calculate its properties as before, writing down explicit expressions for all 5 traveling waves 
shown by arrows in Fig. 16, and then using boundary conditions (124) and (125) at each of points x\^ to 
get a system of 4 linear equations, and then solving it for 4 amplitude ratios. 



WS(x - x { ) 



Wd(x-x 2 ) 



Fig. 2.16. Double-barrier system. Dashed 
lines show (schematically) the position of 
metastable energy levels. 
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However, the transfer matrix approach simplifies the calculations, because we may immediately 
use Eqs. (141), (143), and (146) to write 



T = TIT = 



a a a 



l-ia 



ia 



-ia 
l + ia 



ika 



0 



0 

— ika 



l-ia 



-ia 
l + ia 



(2.147) 



Let me hope that the reader remembers the "row by column" rule of the multiplication of square 
matrices; 33 using it for two last matrices, we reduce Eq. (147) to 



l-ia —ia 
ia 1 + ia 



" f (l-ia)e ika 



-lae 



ika \ 

v iae "~ (l + ia)e~ lka j 



ika 



(2.148) 



Now there is no need to calculate all elements of the full product T, because, according to Eq. (137), for 
the calculation of barrier transparency T we need only one its element, T\\- 





1 




\t 

rill 


a 2 e - ika +{l-iafe ika 


2 - 



Double 
(2.149) barrier 

transparency 



This result is similar to that following from Eq. (79) for E > Uq\ the transparency is a ^-periodic 
function of the product ka, reaching the maximum (T= 1) at some point of each period - see Fig. 17a. 



(a) 




Im A 




Fig. 2.17. Resonant tunneling through a 
quantum well with delta-functional walls : 
(a) transparency a function of ka, and (b) 
2 calculating resonance's FWHM at a» 1. 



ka/ ' n 



However, the new result is different in that for a» 1, the resonance peaks of transparency are 
very narrow, reaching their maxima at ka « k n a = nn, with n = 1, 2, ... Physics of this effect is 
immediately clear from the comparison of this result with our analysis of the simplest quantum well - 
see Fig. 1.7 and its discussion. At k « k n , the incident wave, which undertakes multiple sequential 
reflections from the semi-transparent walls of the well, forms a nearly standing wave, which at a » 1 
virtually coincides with one of eigenfunctions of the well with infinite walls, with the standing wave 
amplitude much larger that that of the incident wave. As a result, the transmitted wave amplitude is 



33 In the analytical form: (AB)^, = ^ A^B .„y , where N is the matrix rank (in our current case, N=2). 
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proportionately increased. This is the famous effect of resonant tunneling,^ in mathematical description 
identical to the resonant transmission of light through an optical Fabry-Perot resonator formed by two 
parallel semi-transparent mirrors. 35 

Probably, the most surprising feature of this system is the fact that its maximum transparency is 
perfect (T max = 1) even at a — > qo, i.e. in the case of a very low transparency of each of two component 
barriers. 36 Indeed, the denominator in Eq. (149) may be interpreted as the squared length of the 
difference between two vectors, one of length a 2 , and another of length | (1 - id) 2 \ = 1 + a 2 , with angle 
6 = 2ka + const between them. At the resonance, the vectors are aligned, and the difference is smallest 
(equal to 1) - see Fig. 17b, so that r max = 1. 

We can use the same vector diagram to calculate the so-called FWHM, the common acronym for 
the Full Width [of the resonance curve at] Half-Maximum, i.e. the difference Ak = k+ - k. between such 
two points on the opposite slopes of the same resonance, at which T = T max /2 - see arrows in Fig. 17a. 
Let the vectors in Fig. 17b be slightly misaligned, by an angle 6 ~ lla 2 « 1, so that the length of the 
difference vector (of the order of a 6~ 1) is still much smaller than the length of each vector. In order to 
double its length squared, and hence reduce Tby a factor of 2 in comparison with its maximum value 1, 
the arc, a 2 6, between the vectors should also become equal ±1, i.e. a 2 (2k±a + const) = ±1. Subtracting 
these two equations from each other, we finally get 

Ak = (k + -k_) = -^Y«k + . (2.150) 
aa 

Now let us use the simple potential shown in Fig. 16 to discuss an issue of large conceptual 
importance. For that, consider what would happen if at some initial moment (say, t = 0) we have placed 
a ID quantum particle inside the double-barrier well with a » 1, and left it there alone, without any 
incident wave. To simplify the analysis, let us prepare the initial state so that it coincides with the 
ground state of the infinite-wall well - see Eq. (1 .76): 

T(x,0) = ^ (x) = — sinfA^x-Xj)], where k x = — . (2.151) 

\a J ! a 

At a — > oo, this is an eigenstate of the system, and from our analysis in Sec. 1.5 we know its time 
evolution: 

<P(*,f) = ^(x)e-^, with co x = %L = ^- = (2.152) 

n 2m 2ma 

telling us that the particle remains in the well at all times with constant probability W{t) = W(0) = 1 . 

However, if parameter a is large but finite, the de Broglie wave should slowly "leak out" from 
the well, so that W(f) would slowly decrease. Let us consider this effect approximately, assuming that 



34 In older literature, it is sometimes called the Ramsauer (or "Townsend", or "Ramsauer-Townsend") effect. 
However, it is currently more common to use that name(s) only for a similar 3D effect, especially at scattering of 
low-energy electrons on rare gas atoms - this is how it was first observed, independently, by C. Ramsauer and J. 
Townsend in the early 1920s. 

35 See also , e.g., EM Sec.7.9. 

36 The exact equality r max = 1 is correct only if both component barriers are exactly equal. 
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the slow leakage, with a characteristic time r» \la>\, does not affect the instant wave distribution 
inside the well, besides the reduction of W? 1 Then we can generalize Eqs. (151), (152) as follows: 



2W 

\ a J 



sinf&j (x-Xj)]]e 



(2.153) 



making the probability of finding the particle in the well equal to W. This solution may be presented as a 
sum of two traveling waves: 



with equal magnitudes of their amplitudes and probability currents 

W 



A = B = 



\laj 



_ h I i2 , _ h W n 

A ~ 1 — • 

m m 2a a 



(2.154) 



(2.155) 



But we already know from Eq. (128) that at a » 1 the delta- functional wall transparency T 
approximately equals 1/a 2 , so that the wave carrying current I A , incident on the right wall from inside, 
induces an outcoming waves outside of the well with the following probability current (Fig. 1 8) 



1 



1 nhW 



Absolutely similarly, 



I R 2 I A J -, . 

a a 2ma 



a 



(2.156a) 



(2.156b) 




•v J 



< 1£ 




0 



Fig. 2.18. Metastable state's decay in the simple model of a ID 
potential well with low-transparent walls - schematically. 



+ V x 



Now we may combine the ID version (6) of the probability conservation law for well's interior, 

dW 



dt 



■ + I R -I L =0, 



with Eqs. (156) to write 



dW 1 nfi 



dt a 2 ma 2 



W . 



(2.157) 



(2.158) 



37 This almost evident assumption finds its formal justification in the perturbation theory to be discussed in 
Chapter 6. 
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Metastable 
state's 
lifetime 



This is just the standard differential equation, 



Metastable 
state's 
decay law 




(2.159) 

at z 

of the exponential decay, with solution W{t) = W(0)Qxp{-t/r}, where constant r, in our case equal to 

(2.160) 



T = 



ma 



7th 



-a 




is called the metastable state's lifetime. Using expression (2.34) for the de Broglie waves' group 
velocity, in our particular wave vector giving v gr = hki/m = nfilma, Eq. (159) may be rewritten as 

(2.161) 

where in our case the attempt time is equal to a/v gr , and T= l/a 2 . Relation (161), that is valid for a 
large class of metastable systems, 38 may be interpreted in the following semi-classical way. The 
confined particle travels back and forth between the confining walls, with time intervals between the 
moments of incidence, each time making an attempt to leak through the wall, with a success probability 
of T, so the reduction of ffper each incidence is AW = - WT. 

Another important look at Eq. (160) may be taken by returning to the resonant tunneling problem 
and expressing the resonance width (150) in terms of incident particle's energy: 



AE = A 



h 2 k 2 

2m 



h 2 k, j h 2 L 1 
-Ak = 



m 



m aa 



2 2 

ma a 



Comparing Eqs. (160) and (162), we get a remarkably simple formula 



Energy-time 
uncertainty 
relation 



AE-t = fi . 



(2.162) 



(2.163) 



This so-called energy-time uncertainty relation is certainly more general than our simple model; 
for example, it is valid for the lifetime and resonance tunneling width of any metastable state. This 
seems very natural, since because of the energy identification with frequency, E = hco, typical for 
quantum mechanics, Eq. (163) may be rewritten as Ag>t = 1 and seems to follow directly from the 
Fourier transform in time, just as the Heisenberg's uncertainty relation (1.35) follows from the Fourier 
transform in space. In some cases, these two relations are indeed interchangeable; for example, Eq. (24) 
for the Gaussian wave packet width may be rewritten as SE-At = fi, where 5E = h{dcoldk)5k = hv gr Sk is 
the r.m.s. spread of energies of monochromatic components of the packet, while At = Sx/v gr is the time 
scale of the packet passage through a fixed observation point x. 

However, Eq. (163) it is much less general than Heisenberg's uncertainty relation (1.35). Indeed, 
in nonrelativistic quantum mechanics, Cartesian coordinates (say, x) of a particle, components of its 
momentum (say, p x ), and energy E are regular observables, presented by operators. In contract, time is 
treated as a c-number argument, and is not presented by an operator, so that Eq. (163) cannot be derived 



38 Essentially the only requirement is to have the attempt time At A to be much longer than the effective time 
{instanton time, see Sec. 5.3 below) of tunneling through the barrier. In the delta-functional approximation for the 
barrier, the latter time vanishes, so that this requirement is always fulfilled. 
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in such general assumptions as Eq. (1.35). Thus the time-energy uncertainty relation should be applied 
with great caution. Unfortunately, not everybody is so careful. One can find, for example, wrong claims 
that due to this relation, the energy dissipated by any system performing an elementary (single-bit) 
calculation during time interval At has to be larger than hi At. 39 Another incorrect statement is that the 
energy of a system cannot be measured, during time At, with an accuracy better than h/At. 40 

Now let us use our simple model of metastable state's decay for a preliminary discussion of one 
aspect of quantum measurements. Figure 18 shows (schematically) one of the traveling wave packets 
emitted by the quantum well after its initial state (152) had been prepared at t = 0. (A similar packet is 
emitted to the left.) At t » r, the well becomes essentially empty (W « 1), and the whole probability 
distribution is localized in two clearly separated wave packets of equal amplitudes, moving from away 
with speed v gr , each "carrying the particle away" with a probability of 50%. Now assume an experiment 
has detected the particle on the left side of the well. Though the formalisms suitable for a quantitative 
analysis of the detection process will not be discussed until Sec. 7.7, due to the wide separation of the 
packets, we may safely assume that the detection may be done without any actual physical effect on the 
counterpart wave packet. 41 But if we know that the particle has been found on the left, there is no chance 
to find it on the right. 

If we attributed the wave function to all stages of this particular experiment, this situation might 
be rather confusing. Indeed, this would mean that the wave function within the right packet should 
instantly turn into zero - the so-called wave packet reduction - a process that cannot be described by 
either Schrodinger equation or any other law of physics we know about. However, if (as was already 
discussed in Sec. 1.3) we attribute the wavefunction to a statistical ensemble of similar experiments, 
there is no paradox here at all. While the two-packet picture we have calculated (Fig. 18) describes the 
full initial ensemble (regardless of the particle detection results), the "reduced packet" picture (with no 
wave packet on the right of the well) describes only a sub-ensemble of experiments with the particle 
detected on the left side. As was discussed on completely classical examples in Sec. 1.3, for such sub- 
ensemble the probability distribution, and hence the wavefunction, may be dramatically different. 



2.6. Coupled quantum wells 

Let us now move on to tunneling through a more complex potential profile shown 
in Fig. 19: a sequence of (N - 1) similar quantum wells separated by N similar delta-functional 
tunnel barriers. According to Eq. (141), its transfer matrix is the following product 

T = T a T fl T g; ..T fl T aj , (2.164) 
(N-l)+N terms 

with the component matrices given by Eqs. (143) and (146), and the barrier height parameter a defined 
by the last of Eqs. (127). 



39 Here I would dare to refer the reader to my own old work K. Likharev, Int. J. Theor. Phys. 21, 311 (1982) that 
presented a constructive proof that at reversible computation (introduced in 1973 by C. Bennett) the energy 
dissipation may be lower than this apparent "quantum limit". 

40 See, e.g., a detailed discussion of this issue in the monograph by V. Braginsky and F. Khalili, Quantum 
Measurement, Cambridge U. Press, 1992. 

41 This argument is especially convincing if the particle detection time is much shorter than the time t c = 2v gr ?/c, 
where c is the speed of light in vacuum, i.e. the maximum velocity of any information transfer ("signaling"). 
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a a 
< X > 



Tr 



x 



Fig. 2.19. Resonant tunneling 
through a system of TV similar, 
equidistant barriers, i.e. (N - 1) 
similar quantum wells. 



Transparency 
of N 
equidistant 
tunnel 
barriers 



Remarkably, this multiplication may be carried out analytically, 42 giving 



i 1-2 

T = IT \ = 


(cos Nqa) 2 + 


f s'mka- a cos ka . . T ^ 

: sin Nqa 

^ sin qa , 


-i 

5 



(2.165) 



where q is a new parameter, with the wave number dimensionality, defined by the following relation: 

cos qa = cos ka + a sin ka. (2.166) 

For N= 1, Eqs. (165) and (166) immediately yield our old result (128), while for N = 2 they may be 
reduced to Eq. (149) - see Fig. 17a. Figure 20 shows its predictions for two larger numbers N, and 
several values of parameter a. 



N = 3 



N = 10 



T 




0.4 0.6 

kal n 




0.4 0.6 

kal n 



Fig. 2.20. Transparency of the system shown in Fig. 19 as a function of product ka. Since the 
function T(ka) is ^-periodic (just like for 7V= 2, see Fig. 17a), only one period is shown. 



Let us start discussion of the plots from case N = 3, i.e. two coupled quantum wells. The 
comparison of Fig. 20a and Fig. 17a shows that the transmission patterns, and their dependence on 
parameter a, are very similar, besides that in the coupled wells each resonant tunneling peak splits into 
two, with the ^-difference between them scaling as II a. In order to comprehend the physics of this 
important result, let us analyze an auxiliary system shown in Fig. 21: two similar quantum wells 



This formula will be easier to prove after we have discussed properties of Pauli matrices in Chapter 4. 
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confined by infinitely high potential walls at x = ±a, and coupled via a transparent, short tunnel barrier 
atx = 0. 



U (x) A 




Fig. 2.21. Two lowest eigenfunctions and 
eigenenergies of a system of two coupled 
quantum wells - schematically. 



The barrier may be again, for calculation simplicity, approximated by a delta-function: 

I + oo, for Ixl > a, 

U(x) = \' '■ (2-167) 

I Wo{x), for |x| <a. 

We already know that the standing-wave eigenfunctions y/ n of the Schrodinger equation in regions with 
U(x) = 0, in our current case, segments -a < x < 0 and 0 < x < +a, may be always presented as linear 
superpositions of sinAx and cosAx. In order to immediately satisfy the boundary conditions ys = 0 at x = 
±a, we can take these solutions in the form 

[C sinA:(x + a), for-a<x<0, 
W„(x) = \ ~ (2.168) 
[C + sink(x-a), forO<x<+a. 

What remains is to satisfy the boundary conditions at x = 0. Plugging Eq. (167) into Eqs. (124) and 
(125), we get the following system of two linear equations: 

2mW 

k(C + -C )coska=— -j- C sinka., (1.169) 
Jj 

C sin ka = —C + sin ka . (2.1 70) 

The system has two types of solutions, with the two lowest-energy eigenfunctions sketched in Fig. 21: 
(i) Antisymmetric solutions (which will be marked with index A), 

(C + ) A =(C_) A , i.e. ys A =C A smk A x, (2.171) 

with eigenvalues independent of W, 

sinA:^fl = 0, i.e. k A a = kna = m, n = \,2,... (2.172) 
Notice that these values of k, and hence eigenenergies of these antisymmetric states, 

^=^~. (2-173) 
2m 2ma 
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coincide with those of the simple quantum well of width a - see Fig. 1 .7 and its discussion, 
(ii) Symmetric solutions (index S): 

(C + ) s = -\C_ ) s , i.e. y/ s = C s |sin k s (x - a)\, 

with Eq. (169) giving the following characteristic equation for constant ks. 



Characteristic 
equation for 
two coupled 

quantum wells 



taxik s a = 



1 



a 



(2.174) 



(2.175) 



Figure 22 shows the graphic solution of this equation for three values of parameter a, i.e. for various 
quantum well coupling strength. For each solution, ksa is confined within interval 



m < k s a < m 



71 



(2.176) 



so that the antisymmetric and symmetric states alternate on the scale of k (and hence of the energy), with 
the difference k A - ks, for each pair of adjacent states, smaller then nlla for any value of a. The physics 
of the splitting between eigenenergies corresponding to the symmetric and antisymmetric states is very 
simple: it is the change of kinetic energy of the particle due to different quantum confinement types - 
see Fig. 21. In each antisymmetric mode, y/ n (0) = y/ n (±a) = 0, i.e. the wavefunction is essentially 
confined within a segment of length a; as a result, its energy (173) does not depend on the barrier height. 
On the contrary, in the symmetric mode, that does engage the potential barrier, the wavefunction 
effectively spreads into the counterpart well. As a result, it changes slower, and hence its kinetic energy 
is also lower that that of the adjacent antisymmetric mode. 




a = 0.3 



Fig. 2.22. Graphical solution of the 
characteristic equation (175) for 
the eigenvalue of ka in the 
symmetric mode, for 3 values of 
parameter a, considering it 
independent of ka. The dashed line 
shows approximation (178). 



ka/ ' n 



By the way, this problem may serve as a toy model of the strongest (and most important) type of 
atom cohesion - the covalent (or "chemical") bonding in molecules, liquids, and solids. The classical 
example of such bonding is that of hydrogen atoms in a H2 molecule. 43 Each of two electrons of this 



43 Historically, the development of the fully quantum theory of H 2 bonding by W. Heitler and F. London in 1927 
was the breakthrough decisive for the acceptance of then-emerging quantum mechanics by chemists. 
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system 44 reduces its kinetic energy very substantially by spreading its wavefunction around both nuclei 
protons, rather that being confined near one of them - as it had to be in a single atom. As a result, the 
bonding is very strong: in chemical units, 429 kJ/mol, i.e. 18.6 eV per molecule. 45 Somewhat counter- 
intuitive, this energy is substantially larger than the strongest classical (ionic) bonding due to electron 
transfer between atoms, leading to the Coulomb attraction of the resulting ions. (For example, the 
atomic cohesion in the NaCl molecule is just 3.28 eV.) 

In the limit a — > 0 (no partition between the wells), ksa — > Mn - 1/2), i.e. the eigenstates 
approach the shape and energy of symmetric states of a quantum well of width 2a. In the opposite limit 
a » 1, ksa — > 7m, and in the vicinity of each such point we may approximate tanksa with (ksa - twi) - 
see the dashed line in Fig. 22. As a result, the characteristic equation (175) is reduced to 

k s a»m- — , (2.177) 
a 

so that the splitting between the wave numbers and eigenenergies of the adjacent symmetric and 
antisymmetric states is small: 

1 , ^ ^ dE /, , \ 7mh 2 1 2E , , „ N 

k A -k s * — «k„, 28 n =E A -E s *—(k A -k s ) = = — 4 - . (2.178) 

aa ak ma aa ma 

(By construction, this result is valid only if a » 1, i.e. 8„ « E A ~ Es.) 

Let us analyze properties of the system in this limit in much more detail - first, because the 
results will help us to develop the important tight binding approximation in the band theory, and second, 
because the weakly coupled quantum wells will be our first example of very important two-level (or 
"spin-l/2-like") systems. Let us focus on one couple of symmetric and antisymmetric states, 
corresponding to virtually the same E n . According to Eqs. (171) and (174), in the limit a — > 00, system's 
eigenfunctions may be approximately represented as follows: 

Ws(x)~^[Wr(x) + V'l(x)\ Wa(x) = ^[Wr(x)-Wl(x)\ (2.179) 

where y/^L are the normalized ground states of the completely insulated wells: 

. JO, for-a<x<0, J- (2/a) 1/2 sin£„x, for -a < x < 0, 

Wr{x) = \ ( v/2 Vl(x) = \ a " A ( 2 - 18 °) 

{(2/a) smk n x, for0<x<+a, [ 0, for0<x<+a. 

Let us perform the following conceptually important thought experiment: place the particle, at t 
= 0, into one of the localized states, say x//r(x), and leave the system alone to evolve. Solving Eqs (180) 
for y/ R , we may present the initial state as a linear superposition of eigenfunctions: 

¥(x,0) = Wr W - - j= k (x) + Wa (*)] • (2-181) 

Now, according to the general solution (1.67) of the time-dependent Schrodinger equation, time 
dynamics may be obtained by just multiplying each eigenfunction by the corresponding factor (1.61): 



44 Due to the opposite spins, the Pauli principle allows them to be in the same orbital ground state - see Chapter 8. 

45 Unit reminder: 1 kJ/mol « 0.0434 eV. 
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4i 



y/ s (x) exp-< - i — t\ + y/ A (x) Qxp< - i —4- 1 



n 



Now, introducing the following natural notation, 

E ,^±^ 



n 



(2.182) 



(2.183) 



And using Eqs. (179), this expression may be rewritten as 

1 



Quantum 
oscillations 
in two 
coupled 
wells 



V2" 



^ 5 (x)expjz^l + ^(x)expj-z^ 




y/ R (x) cos — t + i y/ L (x) sin — t 
h h 




(2.184) 



This result implies, in particular, that the probabilities Wr and Wl to find the particle, correspondingly, 
in the right and left wells change with time as 



W R = cos 2 W, = sin 2 ^t, 

R h h 



(2.185) 



mercifully leaving the total probability constant Wr + Wl= 1 . (If our calculation had not passed this 
sanity check, we would be in a big trouble.) 

This is the famous effect of periodic quantum oscillations, with frequency co n = 2S n /h = {E A - 
Es)lh, of the particle between two similar quantum wells, due to their coupling through via tunneling 
through the tunnel barrier. The physics of this effect is straightforward: just as in the single well problem 
discussed in Sec. 5, the particle initially placed into a certain quantum well tries to escape from it via 
tunneling through the semi-transparent wall. However, in our current situation (Fig. 21) the particle can 
only escape into the adjacent well. After the tunneling into that second well, the tries to escape from it, 
and hence comes back, etc. - just as a classical ID oscillator, initially deflected from its equilibrium 
position. 

Maybe the most surprising feature of this effect is its relatively high frequency: according to Eq. 
(178), the time period of the quantum oscillations, 



At 



2k 



27ih 



In ma^ 



{E A -E S ) n nh 



a, fora»l, 



(2.186) 



is a factor of a/2n» 1 shorter than the lifetime r (160) of the metastable state of the particle in a 
similar but single quantum well limited by delta-functional walls with similar parameter a. This is a 
very counterintuitive result indeed: the speed of particle tunneling into a similar adjacent well is much 
higher than that, through a similar barrier, to the free space! 

To see whether this result is an artifact of the delta-functional model of the tunnel barrier, let us 
calculate splitting 28„ for system of two similar, symmetric, soft quantum wells formed by a smooth 
potential profile U(x) = U(-x) - see Fig. 23. 
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Fig. 2.23. Weak coupling between two 
similar, soft quantum wells. 



If the barrier transparency is low, the quasi-localized wavefunctions i//r(x) and y/dx) = Wr(-x) 
and their eigenenergies may be found approximately by solving the Schrodinger equations in one of the 
wells, neglecting tunneling through the barrier, but finding 8 n requires a little bit more care. Let us write 
the stationary Schrodinger equations for the symmetric and antisymmetric solutions in the form 



\E t -wk =-f ¥&. -f wK =-f d -^f. 

2m ax 2m ax 



(2.187) 



then multiply the former equation by the latter one by \j/a, subtract them from each other, and 
integrate the result from 0 to oo: 



oo i 2 °° 

( e a - E s)\¥sYa<^ = —\ 



d y/ s 
dx 1 



d Wa 
dx 2 



¥s 



dx. 



(2.188) 



2 2 a 

If U(x), and hence d y/A,sldx , are finite for all x, 46 we may integrate the right-hand side by parts to get 



( e a - E s)[¥s¥A d x = — 
J 2m 



dx 



dx 



(2.189) 



So far, this is an exact equation. For weakly coupled wells, we can do more. In this case, the left 
hand side may be approximated as (Ea - Es)/2 = S n , because the integral is dominated by the vicinity of 
point a, where the second terms in each of Eqs. (179) are negligible, and the integral is equal to Vz, due 
to the proper normalization of function i//r(x). In the right-hand side, the substitution at x = oo vanishes 
(due to the wavefunction decay in the classically forbidden region), and so does the first term at x = 0, 
because for the antisymmetric solution ^(0) = 0. As a result, we get 



S n =^¥ s {0) , 
2m dx 



rfr '(0) = 



^^(0)^(0 = ^(0)^(0) = -^(0)^(0). (2.190) 
m dx m dx m dx 



It is straightforward to show that within the limits of the WKB approximation validity, Eq. (190) 
may be reduced to 





X 




8 n =— exp< 


c 

- jV(x')dx' 






X 

c 





WKB 
result 
(2.191) for 

coupling 
energy 



46 Since it is not true for potential (167), one should not be surprised that the resulting Eq. (189) is invalid for our 
initial problem, giving S„ twice larger than the correct expression (178). 
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where Ia is the time period of classical motion of the particle inside one of the wells, function k(x) is 
defined by Eq. (97), and x c and x c ' are the classical turning points limiting the potential barrier at the 
level E n of particle's energy - see Fig. 23. Comparing this result with Eq. (117), we can notice that 
again, just as in the case of the delta-functional barriers, the transmission coefficient T of a tunnel barrier 
(and hence the reciprocal lifetime of a metastable state in a potential well separated by such a barrier 
from a continuum) scales as the square of the WKB exponent participating in Eq. (191), so that the 
period of quantum oscillations between the well is much smaller than the lifetime. We will return to the 
discussion of this result, in a more general form, in Chapter 5. 

Returning for a second to Fig. 20a, we may now readily interpret the results for tunneling 
through the double quantum well: each pair of resonance peaks of transparency corresponds to the 
alignment of incident particle's energy with the pair of energy levels E A , Es of the symmetric and 
antisymmetric states of the system. 



Let us now return to Eqs. (165) and (166) describing the resonant tunneling, and discuss their 
predictions for larger TV- see, for example, Fig. 20b. We see that the increase of TV results in the increase 
of the number of resonant peaks per period to (TV - 1), and at TV — > oo the peaks merge into the so-called 
allowed energy bands (frequently called just the "energy bands") of relatively high transparency, 
separated from similar bands in the adjacent periods of function T{ka) by energy gaps 41 where T — > 0. 
Notice the following important features of the pattern: 

(i) at TV — > oo, the band/gap edges become sharp for any a, and tend to fixed positions 
(determined by a but independent of TV); 

(ii) the larger interwell coupling {a — > 0), the broader the allowed energy bands and narrower the 
gaps between them. 

Our discussion of resonant tunneling in the previous section gives us an evident clue for a semi- 
quantitative interpretation of this pattern: if (TV - 1) quantum wells are weakly coupled by tunneling 
through the tunnel barriers separating them, system's energy spectrum consists of groups (TV- 1) energy 
levels. Each level corresponds to an eigenfunction that is the set of similar local functions in each well, 
but with certain phase shifts Aq> between them. It is natural to expect that, just as for 2 coupled wells (TV 
-1=2), that at the upper level, Aq> = n (thus providing the highest quantum confinement), with ka — > 
m at a — > oo, while at the lowest level all Aq> = 0, providing the most loose confinement. 48 However, 
what about Atp for other levels? 

Answers to all these questions are easy to get in the most important limit TV — > oo, i.e. for periodic 
structures - which are, in particular, good ID approximations for solid state crystals, whose samples 
may feature more than 10 10 similar atoms or molecules in each direction of the crystal lattice. It is 
almost self-evident that at TV — > oo, due to the translational invariance of U(x), 



47 In solid state (especially semiconductor) physics and electronics, term bandgaps is more common. 

48 This expectation is implicitly confirmed by Fig. 20: at a » 1, the highest resonance peak in each group tends 
to ka = m, and the lowest peak also tend to a position independent of TV (though dependent on a). 



2.1 . ID band theory 



U(x + a) = U(x), 



(2.192) 
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the phase shift Acp between local wavefunctions in all adjacent quantum wells should be the same for 
each period of the system, i.e. 



y/(x + a) = y/(x)e lA<p 



(2.193a) 



for all x. (A reasonably fair classical image of Acp is the geometric angle between similar objects - e.g., 
similar paper clips - attached at equal distances to a long, uniform rubber band. If the band's ends are 
twisted, the twist is equally distributed between the structure's periods, representing the constancy of 
Acp. 49 ) 

Equation (193a) is the (ID version of the) much-celebrated Block theorem. 50 Mathematical rigor 
aside, 51 it is a virtually evident fact, because the particle's density w(x) = y/*(x)y(x), that has to be 
periodic in this a-periodic system, may be so only Acp is constant. For what follows, it is more 
convenient to present the real number Acp in the form qa (there is no loss of generality here, because 
parameter q may depend on a as well as other parameters of the system), so that the Bloch theorem takes 
the form 



y/(x + a) = y/(x)e 



iqa 



(2.193b) 



The physical sense of parameter q will be discussed in detail below; for now just note that according to 
Eq. (193b), an addition of {Inla) to it yields the same wavefunction; hence all observables have to be 
(2;r/a)-periodic functions of q. 52 

Now let us use the Bloch theorem to find eigenfunctions and eigenenergies for a particular, and 
probably the simplest periodic function U(x): an infinite set of similar quantum wells separated by delta- 
functional tunnel barriers (Fig. 24). 



1D Bloch 
theorem 



a a a 
< X X > 



x , 



Fig. 2.24. The simplest periodic potential: 
-> an infinite set of similar, equidistant, 
x delta-functional tunnel barriers. 



49 I am ashamed to confess that, due to the lack of time, this was virtually the only "lecture demonstration" in my 
QM courses. 

50 Named after F. Bloch who applied this concept to wave mechanics in 1929, i.e. very soon after its formulation. 
Admittedly, in mathematics, an equivalent statement, usually called the Floquet theorem, has been known since at 
least 1883. 

51 I will address this rigor in two steps. Later in this section, we will see that the function obeying Eq. (193) is 
indeed a solution of the Schrodinger equation. However, to save time/space, it will be better for us to postpone the 
proof that any eigenfunction of the equation, with periodic boundary conditions, obeys the Bloch theorem, until 
Chapter 4. As a partial reward for the delay, that proof will be valid for an arbitrary spatial dimensionality. 

52 Product hq, which has the dimensionality of momentum, is called either the quasi-momentum or (especially in 
the solid state physics) the "crystal momentum" of the particle. 
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To start, consider two points separated by distance a: one of them, xy, just left of position of one 
of the barriers, and another one, xy+ij'ust left of the following barrier. Eigenfunctions in each of the 
points may be presented as linear superpositions of two simple waves exp{±z'£x}, and amplitudes of their 
components should be related by a 2x2 transfer matrix T of the potential fragment separating them. 
According to Eq. (141), this matrix may be found as the product of the matrix (146) of one interval a 
and the matrix (143) of one barrier: 



A ^ 



T T 

a a 



ika 



0 



0 

-ika 



J 



\-ia 



ia 



-la 
\ + ia 



(2.194) 



However, according to the Bloch theorem (193b), the component amplitudes should be also related as 

iqa 



(A ^ 


_ J,qa 


'A- 








B 






V jj v 



0 



0 

iqa 



(2.195) 



The condition of self-consistency of these two equations leads to the following characteristic equation: 



f Q ika 



0 



0 

-ika 



) 



l-ia 



ia 



— ia 
\ + ia 



iqa 
0 



0 

iqa 



J 



(2.196) 



In Sec. 5, we have already calculated the matrix product participating in this equation - see Eq. 
(148). Using it, we see that Eq. (196) is reduced to the same simple Eq. (166) that has already jumped at 
us from the solution of the different (resonant tunneling) problem. Let us explore that simple result in 
detail. First of all, the right hand part of Eq. (166) is a sinusoidal function of ka, with amplitude (1 + 
(f) V2 - see Fig. 25, while its left hand part is a sinusoidal function of qa with amplitude 1. 
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Fig. 2.25. Graphical solution of the characteristic 
equation (166) for a fixed value of parameter a. The 
ranges of ka that yield with I cos qa \ < 1 , correspond to 
the allowed energy bands, while those with I cos qa I > 1 , 
to gaps between them. 



As a result, within each period A(ka) = 2n, the characteristic equation does not have a real 

2 2 

solution for q inside two intervals of ka - and hence inside two intervals of energy E = h k 12m. (These 
intervals are exactly the energy gaps mentioned above, while the complementary intervals of ka and E, 
where a real q exists, are the allowed energy bands.) In contrast, parameter q can take any real values, so 

2 2 

it is more convenient to plot the eigenenergy E = h k 12m as the function of q (or, even more 
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conveniently, qa) rather than ka. 53 While doing that, we need to recall that parameter a, defined by the 
last of Eqs. (127), depends on wave vector k as well, so that if we vary q (and hence k), it is better to 
characterize the structure by a different, ^-independent dimensionless parameter, for example 



P = {ka)a 



mWa 



(2.197) 



so that Eq. (166) becomes 



cos qa = cos ka + — sin ka. 

ka 



Characteristic 

n 1QS^ ec l uation 
{z..iyo) for system 

in Fig. 24 



Figure 26 shows the plots of E and k , following from Eq. (198), for a particular, moderate value 
of parameter /?. The band structure of the energy spectrum is apparent. Another evident feature is the 
2 ^-periodicity of the pattern, that we have already predicted from the general Bloch theorem arguments. 
(Due to this periodicity, the complete band/gap pattern may be studied on just one interval -n< qa < + 
n, called the 1 st Brillouin zone - the so-called reduced zone picture. For some applications, however, it 
is more convenient to use the extended zone picture with -co<qa< +co - see, e.g., the next section.) 
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Fig. 2.26. (a) "Real" momentum A: of a particle in the periodic delta-functional potential profile shown in 
Fig. 24, and (b) its energy E = n 2 i^l2m (in units of Eq = fr 2 /2ma 2 ), as functions of the quasi-momentum q, 
for a particular value (/? = 3) of the dimensionless potential parameter /? = {ka) a = m Wa/h 2 . Arrows in the 
lower right corner of panel (b) illustrate the definition of the energy band (AE„) and energy gap (A„) widths. 



53 Perhaps a more important reason for taking q as the argument is that for motion in a general potential U{x), 
particle's momentum tik is not a constant of motion, while (according to the Bloch theorem), the quasi-momentum 
hq is. 
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However, maybe the most surprising fact, clearly visible in Fig. 26, is that there is a infinite 
number of energy bands, with different energies E„(q) for the same value of q. Mathematically, it is 
evident from Eq. (198) - see also Fig. 25. Indeed, for each value of qa there are two solutions ka to this 
equation on each period A(ka) = In - see also panel (a) in Fig. 26. Each of such solutions gives a 

2 2 

different value of particle energy E = h k 12m. A continuous set of similar solutions for various qa 
forms a particular energy band. 

Since the band theory is one of the most vital results of quantum mechanics, it is important to 
understand the physics of these different solutions - and hence of the whole band picture. For that, let us 
explore analytically two different potential strength limits. An important advantage of this approach is 
that both analyses may be carried out for an arbitrary periodic potential U(x), rather than for the simplest 
model shown in Fig. 24. 

(i) Tight-binding approximation . This approximation is sound when eigenenergy E n is much 
lower than the height of the potential barriers separating the potential minima (serving as quantum 
wells) - see Fig. 27. As should be clear from our discussion in Sec. 6, the wavefunction is mostly 
localized in the classically allowed regions at points Xj of the potential energy minima - see the dashed 
lines in Fig. 27. Essentially the only role of coupling between these quantum well states (via tunneling 
through the separating barriers) is to establish certain phase shifts Acp = qa between the pairs of adjacent 
quasi-localized wavefunction "lumps" u(x - xj) and u{x-Xj+\). 



U(x) 



u n {x-x j+x ) 




Fig. 2. 27. Tight binding 
> approximation (schematically). 



To describe this effect quantitatively, let us first return to the problem of two coupled wells 
considered in Sec. 6, and recast result (184) as 

(x, t) = [a R {t)y/ R (x) + a L {t)y/ L (x)]exp|- i ^ A, (2. 1 99) 

where functions an and ai oscillate sinusoidally in time: 

S S 
a R (t) = cos — t, a L (t) = ism — t. (2.200) 
h h 

This evolution satisfies the following system of two equations whose structure reminds Eq. (1.59): 

ihd R =-S n a L , ihd L =-S n a R . (2.201) 

Later in the course (in Chapter 6) we will prove that such equations are indeed valid, in the tight- 
binding approximation, for any system of two coupled quantum wells. These equations may be readily 
generalized to the case of many similar coupled wells. Here, in this case, instead of Eq. (199), we 
evidently should write 
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(x, t) = exp{- 1 a i CK ( x ~ x j)> ( 2 - 202 ) 



j 



where E„ are the eigenenergies, and u„ the eigenfunctions of each isolated well. In the tight binding 
limit, only the adjacent wells are coupled, so that instead of Eq. (201) we should write an infinite system 
of similar equations 

iMj = S n a H - 8 n a j+x , (2.203) 

for each well number j, where parameters 8„ describe the coupling between two adjacent quantum wells. 
Repeating the calculation outlined in the end of Sec. 6 for our new situation, we get the result essentially 
similar to the last form of Eq. (190): 




Tight 
binding 

(2.204) limit: ,. 

v ' coupling 

energy 



where xo is the distance between the well bottom and the middle of the tunnel barrier on the right of it - 
see Fig. 27. The only substantial new feature of this expression in comparison with Eq. (190) is that the 
sign of 8 n alternates with the level number n: 8\ > 0, 8i < 0, S3 > 0, etc. Indeed, the number of "wiggles" 
(formally, zeros) of eigenfunctions u„(x) of any potential well increases as n - see, e.g., Fig. 1.7, 54 so 
that the difference of the exponential tails of the functions, sneaking under the left and right barriers 
limiting the well also alternates with n. 

The infinite system of ordinary differential equations (203) allows one to explore a large range 
of important problems (such as the spread of the wavefunction that was initially localized in one well, 
etc.), but our main task now is to find its stationary states, i.e. the solutions proportional to exp{- 
i{s n lti)t}, where s„ is a still unknown, ^-dependent addition to the background energy E„ of n-th level. In 
order to satisfy the Bloch theorem (193) as well, such solution should have the form 

a j (t) = a expygXj -i^-t + const j , (2.205) 
where a is a constant. Plugging this solution into Eq. (203) and canceling the common exponent, we get 



E = E + s„ = E„ - S n [e iqa + e iqa ) = E„ - 25 ' cos qa , 

n n n n \ } n n 1 " 



Tight 
binding 
(2.206) limit: 



so that in this approximation, the energy band width AE„ (see Fig. 26b) equals 4\S n 



energy 
bands 



Relation (206), whose validity is restricted to \ S n \ « E n , describes the particular lowest energy 
bands plotted in Fig. 26b reasonably well. (For larger /?, the agreement would be even better.) So, this 
calculation explains what the energy bands really are - in the tight binding limit they are best interpreted 
as isolated well's energy levels E n , broadened into bands by the interwell interaction. Also, this result 
gives a clear proof that the energy band extremes correspond to qa = 2nl and qa = liil + V2), with 
integer /. Finally, the sign alteration of the coupling coefficient S„ (204) with number n explains why the 
energy maxima of one band are aligned, on the qa axis, with energy minima of the adjacent bands. 



54 Below, we will see several other examples of this behavior. This alternation rule is also in accordance with the 
Bohr-Sommerfeld quantization condition 
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(ii) Weak potential limit . Surprisingly, the energy band structure is also compatible with a 
completely different physical picture that can be developed in the opposite limit. Let energy E be so 
high that the periodic potential U(x) may be treated as a small perturbation. Naively, we would have the 
parabolic dispersion relation between particle's energy and momentum. However, if we are plotting 
energy as a function of q rather than k, we need to add Inlla, with arbitrary integer /, to the argument. 
Let us show this by expanding all variables into the spatial Fourier series. For a periodic potential 
energy U(x) such an expansion is straightforward: 55 



U(x) = ^U r expj-z — /"] 



(2.207) 



where the summation is over all integers /", from - oo to + oo. However, for the wavefunction we should 
show due respect to the Bloch theorem (193). To understand how to proceed, let us define another 
function 



u(x) = y/(x)e 



-iqx 



and study its periodicity: 



u(x + a) = y/(x + a)e iq ^ x+a ^ = y/(x)e iqx = u(x) . 



(2.208) 



(2.209) 



We see that the new function is a-periodic, and hence we can use Eqs. (208)-(209) to rewrite the Bloch 
theorem as 



Bloch 
theorem: 
alternative 
form 



y/(x) = u(x)e iqx , with u(x + a) = u(x) . 



Now it is safe to expand the periodic function u(x) exactly as U(x): 

u(x) = expi-z'^^/'l, 



(2.210) 



(2.211) 



so that, according to the Bloch theorem, 



2m 



y/(x) = e iqx ^u r exp< -i /' > = ^u,, exp< i 



ci- 



V 

a J 



(2.212) 



The only nontrivial part of plugging this expression into the stationary Schrodinger equation (61) is the 
calculation of the product term, using expansions (207) and (211): 



U (x)y/ = ^U r u r exp< i 



2tdc 



(/' + /") 



/',r 



(2.213) 



At fixed /', we may change summation over /" to that over 1 = V + I" (so that I" = 1—1'), and write: 



£/(x)^ = £expj/#-— / \\Y, u u u i-r 



a 



(2.214) 



The benefits of my unusual choice of the summation index (/" instead of, say, I) will be clear in a few lines. 
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Now plugging Eqs. (212) (with index /' now replaced by I) and (214) into the stationary Schrodinger 
equation (61), and requiring the coefficients of each spatial exponent to match, we get an infinite system 
of linear equations for w/: 56 



Z U '-r u r = 



h 2 f 2n r ) 

q / 

V a J 



2m 



(2.215) 



So far, this system is an equivalent alternative to the initial Schrodinger equation - and, by the 
way, is very efficient for fast numerical calculations, for virtually any potential strength, though in 
systems with tight binding it may require taking into account a large number of harmonics U\. In the 
weak potential limit, i.e. if all the Fourier coefficients U„ are small, 57 we can complete all the 
calculation analytically. 58 Indeed, in the so-called 0 th approximation we can ignore all U„, so that in 
order to have at least one ui different from 0, Eq. (215) requires that 



2m 



q 



2nl 

a J 



(2.216) 



(ui itself should be obtained from the normalization condition). This result means that the dispersion 
relation E(q) has an infinite number of similar quadratic branches numbered by integer / - see Fig. 28. 




1 = 2 

Fig. 2.28. ID band picture in the 
weak potential case (A„ « !?•"'), 
Shading shows the 1 st Brillouin zone. 



qal2n 



On any branch, the eigenfunction has just one Fourier coefficient, i.e. presents a monochromatic 
traveling wave 



y/, — » u t e lkx = u, expj i 



2d 

q 

v a J 



(2.217) 



56 Note that we have essentially proved that the Bloch wavefunction (210) is indeed a solution of Eq. (61), 
provided that the quasi-momentum q is selected in a way to make the system of linear equation (215) compatible, 
i.e. is a solution of its characteristic equation - see, e.g., Eq. (223) below. 

57 Besides the constant potential U 0 that, as we know from Sec. 2, may be included into energy in a trivial way, so 
that we may take U 0 = 0. 

58 This method is so powerful that its multi-dimensional version is not much more complex than the ID version 
described here - see, e.g., Sec. 3.2 in the classical textbook by J. M. Ziman, Principles of the Theory of Solids, 2 nd 
ed., Cambridge U. Press, 1979. 
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This fact allows us to rewrite Eq. (215) in a more transparent form 

Y, U r-i u r =(E-E, )u„ 



that may be formally solved for uf. 



E-E, f 



Yi U r-i u r 



(2.218) 



(2.219) 



/ v*i 



If the Fourier coefficients U„ are nonvanishing but small, this formula shows that wavefunctions do 
acquire other Fourier components (besides the main one, with the index corresponding to the branch 
number), but these additions are all small, besides narrow regions near the points Ei = Er where two 
branches (216) of the dispersion relation E(q), with some specific numbers / and /', cross. This happens 
when 



2n 



I 

a J 



2n 



V 

a ) 



(2.220) 



Weak 
potential 
limit: 
energy gap 
positions 



i.e. at q « q m = mn/a (with integer m = l + l') 59 corresponding to 



E, * E r * 



%1 [ K {l + V)-2nl] 2 =^n 2 =E w 



2ma 



2ma' 



(2.221) 



with integer n = l—V. (Equation (221) shows that index n is just the number of the branch crossing on 
the energy scale - see Fig. 28.) In such a region, E has to be close to both Ei and Er, so that the 
denominator in just one of the infinite number of terms in Eq. (219) is very small, making the term 
substantial despite the smallness of U n .. Hence we can take into account only one term in each of the 
sums (written for / and /*): 



U_ n u r ={E-E, )u„ 
U n u, ={E-E v )u v . 



(2.222) 



Taking into account that for any real function U(x) the Fourier coefficients in series (207) have to be 
related as U. n = U„*, Eq. (222) yields the following simple characteristic equation 



E-E, 
-U. 



E-E„ 



= 0. 



(2.223) 



with solution 



Weak 
potential 

limit: 
energies 

near 
bandgap 




According to Eq. (216), close to the branch crossing point q„ 
participating in this result may be approximated as 60 



(2.224) 



n(l + l')/a, the fraction 



59 Let me hope that the difference between this new integer and particle's mass, both called m, is absolutely clear 
from the context. 

60 Physically, f3lti = h{nrtla)m = %k {n) lm is just the velocity of a free classical particle with energy tf n) . 
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E,-E„ 



« yq, with y = 



dE 



xTi 2 n 2E (n) 



and q = q - q m , (2.225) 



2 



dq 



q 



in 



ma 



mi 



while parameters £ ave = (Ei + £/ )/2 = and U„U*„ = I U n I 2 do not depend on q , i.e. the distance from 
the central point q m . This is why Eq. (224) may be plotted as the famous level anticrossing diagram 
(Fig. 29), with the energy gap width A„ equal to 2 I U„ I, i.e. just double the magnitude of the n-th Fourier 
harmonic of the periodic potential U(x). Such anticrossings are also clearly visible in Fig. 28 that shows 
the results of the exact solution of Eq. (198) for (5= 0.5. 61 



We will run into the anticrossing diagram again and again in the course, notably at the discussion 
of spin. Such diagram characterizes any quantum systems with two weakly-interacting eigenstates with 
close energies. It is also repeatedly met in classical mechanics, for example at the calculation of 
eigenfrequencies of coupled oscillators. 62 ' 63 In our current case of the weak potential limit, the diagram 
describes the weak interaction of two sinusoidal de Broglie waves (216), with oppositely directed wave 
vectors, / and -/' , via the (/ - /') th (i.e. n th ) Fourier harmonic of the potential profile U(x). This effect 
exists also for the classical wave theory, and is known as the Bragg reflection, describing, for example, 
the ID case of the wave reflection by a crystal lattice (Fig. 1.5) in the limit of weak interaction between 
the incident particles and the lattice. 

Returning for the last time to our initial result - the band structure for the delta-functional U(x) 
(Fig. 24), shown in Fig. 26, we may wonder how general it is, taking into account the peculiar properties 
of the delta-function approximation. A partial answer may be obtained from the band structure for two 
more realistic and relatively simple periodic functions U(x): the sinusoidal potential (Fig. 30a) and the 
rectangular Kronig-Penney potential shown in Fig. 30b. 

For the sinusoidal potential (Fig. 30a), with U(x) = U\COs(2mc/a), the stationary Schrodinger 
equation (61) takes the form 



61 From that figure, it is also clear that in the weak potential limit, width AE„ of the n-th energy band is just - 
E { " ' l) - see Eq. (221). Note that this is exactly the distance between adjacent energy levels of the simplest ID 
quantum well of infinite depth - cf. Eq. (1.77). 

62 See, e.g., CM Sec. 5.1 and in particular Fig. 5.2. 

63 Actually, we could obtain this diagram earlier in this section, for the system of two weakly coupled quantum 
wells (Fig. 23), if we assumed the wells to be slightly dissimilar. 




Fig. 2.29. Level anticrossing diagram. 
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2m dx 

By the introduction of dimensionless variables 

_ 7IX 

$ = —, a = 



h d w TT 2tdc „ 
T" + t/j cos y/ = Ey/ . 



a 



*0) 



(i) ' 



(2.226) 



(2.227) 



where E^ l) is defined by Eq. (221), Eq. (226) may be reduced to the canonical form of the well-known 
Mathieu equation 64 



Mathieu 
equation 



d 2 y/ 



+ (a -2/? cos 2£V = 0. 



(2.228) 



U(x) 




U(x) 

0 



A 


d 
<— > 










< — 


> 










a 






X 



(b) 



Fig. 2.30. Two simple periodic potential 
profiles: (a) the sinusoidal ("Mathieu") 
potential and (b) the Kronig-Penney 
potential. 



Figure 31 shows the so-called characteristic curves of the Mathieu equation, i.e. the relations 
between parameters a and /?, corresponding to the energy band edges separating them from the adjacent 
bands. (Such curves may be readily calculated numerically, for example, using Eqs. (215) with the band- 
edge values qa = 0 and qa = In such "phase plane" plots, the detailed information about the energy 
dependence on the quasi-momentum is lost, but we already know from Fig. 26 that the dependence is 
not too eventful. The most remarkable feature of these plots is the fast (exponential) disappearance of 
the allowed energy bands at 2/?> a (in Fig. 31, above the red dashed line), i.e. at E < U\. This may be 
readily explained by our tight-binding approximation result (206): as soon as the eigenenergy drops 
significantly below the potential maximum U max = U\ (see Fig. 30a), quantum states in the adjacent 
potential wells are only connected by tunneling through the separating potential barriers, with 
exponentially small amplitudes S„ - see Eq. (204). 

On the other hand, the characteristic curves below the dashed line, i.e. at 2/?< a, correspond to 
virtually free motion of the particle with energy E above U max = U\. Naturally, in this region the energy 
bands rapidly expand while gaps virtually disappear. This could be expected from the weak potential 
limit analysis (see Fig. 28 and its discussion); however, based on that analysis one could expect that the 



64 This equation, first studied in the 1860s by E. Mathieu in the context of a rather practical problem of vibrating 
elliptical drumheads (!), has many other important applications in physics and engineering, notably including the 
parametric excitation of oscillations - see, e.g., CM Sec. 4.5. 
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energy gaps A„ « 2 I U„ I would disappear more gradually. The fast decline of the gaps at U\ — » 0 (i.e. /? 
— » 0) in the Mathieu equation is an artifact of the sinusoidal potential U(x), with no Fourier harmonics 
U„ above the first one. (In order to calculate the correct asymptotic behavior A„ <x fT at f3 — > 0, one 
needs to go beyond the first approximation we have used in the weak potential limit analysis.) 




Fig. 2.31. Characteristic curves of the 
Mathieu equation. In application to the band 
theory, dotted regions correspond to the 
energy gaps, while regions between them, to 
energy bands. The red dashed line 
corresponds to condition a = 2/3, i.e. E = U\ 
= [/max, separating the regions of tunneling 
and over-barrier motion. Figure adapted from 
http://www.enm.bris.ac.uk/teaching/ . 



a/4 



If one wants to study the details of transition between the two limits in the ID band theory 
without the artifacts of the delta- functional model shown in Fig. 24 (with infinite number of harmonics 
U„ independent of n) and of the Mathieu equation (with all U„ = 0 for n * +1), the standard way is to 
examine the Kronig-Penney potential shown in Fig. 30b. For this potential, the characteristic equation 
may be readily derived using our rectangular barrier analysis in Sec. 3. For the case E < Uq, the result is 
the following natural generalization of Eq. (166): 

1 (k k\ 

cos qa = cosh ted cos k(a-d) + — sinh ted sin k(a - d) , (2.229) 



2 



k k) 



where parameters k and k are defined, as functions of E and Uq, by Eqs. (62) and (65). In the opposite 
case E > Uq, one can use the same formula with the replacement (73). Plots E{q), described by these 
formulas, 65 are very similar to those shown in Figs. 26b and 28 above. In order to see some difference, 
one needs to plot the characteristic curves Uq(E). This may be done by taking qa = 0 and qa = n (i.e. 
cosga = ±1) in Eq. (229), and solving the resulting transcendent equation for Uq numerically. The curves 
are generally similar to those shown in Fig. 31, but, in accordance with Eq. (224), exhibit a more 
gradual decrease of energy gaps: 



U 



A„ -> 2\U J oc at E ~ E (n) » U Q . (2.230) 



77 



To conclude this section, let me address the effect of periodic potential on the number of 
eigenstates in ID systems of large but finite length / » a, k~ . Surprisingly, the Bloch theorem makes 
the analysis of this problem elementary, for arbitrary U(x). Indeed, let us assume that / is comprised of 



65 Such plots, for several particular values of parameters, may found, for example, in Figs. 8.11-8.13 of E. 
Merzbacher's textbook cited above. 
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rule 



an integer number of periods a, and its ends are described by the similar boundary conditions - both 
assumptions evidently inconsequential for I » a (such as a 1 -cm-scale crystal with -10 atoms along 
each direction). Then, according to Eq. (210), the boundary conditions impose, on the quasi-momentum 
q, exactly the same quantization condition as we had for k for a free ID motion. Hence, instead of Eq. 
(1.94) we can write 

(2.231) 



(2.232) 




1D density 
of states 

and with the corresponding change of the summation rule: 

summation 



Hence, the density of states in ID q-space, dNIdq = U2n, does not depend on the potential profile 
at all! Note, however, that the profile does affect the density of states on the energy axis, dNIdE. As an 
extreme example, on the bottom and at the top of each energy band we have dEldq — > 0, and hence 



dN 



dN dE 
dE dq dq 



J_,dE_ 
In dq 



■ oo . 



(2.233) 



This divergence (which survives in higher spatial dimensionalities as well) of the state density has 
important implications for the operation of several electron and optical devices, in particular 
semiconductor lasers. 



2.8. Effective mass and the Bloch oscillations 

The band structure of the energy spectrum has profound implications not only on the density of 
states, but also on the dynamics of particles in periodic potentials. In order to see that, let us consider the 
simplest case: motion of a wave packet consisting of Bloch functions (210), all in the same (say, n th ) 
energy band. Similarly to Eq. (27) for the a free particle, we can describe such a packet as 

W(x, t) = \ a q u q {x)e^ qx ~ a ^\iq , (2.234) 

where the a-periodic functions u(x), defined by Eq. (208), are now indexed to emphasize their 
dependence on the quasi-momentum, and co(q) = E n (q)/h is the function of q describing the shape of the 
corresponding energy band - see, e.g., Fig. 26b or Fig. 28. If the packet is narrow, i.e. the width Sq of 
the distribution a q is much smaller than all the characteristic scales of the dispersion relation co(q), in 
particular Ttla, we may simplify Eq. (234) exactly as we have done in Sec. 2 for a free particle, despite 
the presence of factors u q (x) under the integral. In the linear approximation of the Taylor expansion, we 
again get Eq. (32), but now with 66 

dco 



Vgr dq 



CO 

and v h = — 

q 



(2.235) 



66 A generalization of this expression to the case of essential interband transitions is not difficult using the 
Heisenberg picture of quantum mechanics (which will be discussed in Chapter 4 of this course) - see, e.g., Sec. 55 
in E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics, Part 2, Pergamon,1980. 



Chapter 2 



Page 55 of 72 



Essential Graduate Physics 



QM: Quantum Mechanics 



where qo is the central point of the quasi-momentum distribution. Despite the formal similarity with Eq. 
(33) for the free particle, this result is much more eventful; for example, as evident from the dispersion 
relation's topology (see Figs. 26b, 28), the group velocity vanishes not only at q = 0, but at all values of 
q that are multiples of (jda), at the bottom and on the top of each energy band. At these points, packet's 
envelope does not move in either direction - though may keep spreading. 67 

Even more fascinating phenomena take place if a particle in the periodic potential is the subject 
of an additional external force F(f). (For electrons in a crystal lattice, this may be, for example, the 
Lorentz force of the applied electric and/or magnetic field.) Let the force be relatively weak, so that 
product Fa (i.e. the scale of energy increment from the additional force per one lattice period) is much 
smaller than the relevant energy scales the dispersion relation E(q) - see Fig. 26b: 

Fa«AE n ,A n . (2.236) 

This relation allows one to neglect the force-induced interband transitions, so that the wave packet (234) 
includes the Bloch eigenfunctions belonging to only one (initial) energy band at all times. For the time 
evolution of its center qo, theory yields 68 an extremely simple equation of motion 



q Q =jF(t). 
n 



Time 
evolution 
\L.L5I) of quasi- 



momentum 



This equation is physically very transparent: it is essentially the 2 nd Newton law for the time evolution 
of the quasi-momentum fiq under the effect of the additional force F(t) only, excluding the periodic 
force -8U(x)/dx of the background potential U(x). This is very natural, because fiq is essentially the 
particle's momentum averaged over potential's period, and the periodic force effect drops out at such an 
averaging. 

Despite the simplicity of Eq. (237), the results of its solution may be highly nontrivial. First, let 
us use Eqs. (235) and (237) find the instant group acceleration of the particle (i.e. the acceleration of its 
wave packet's envelope): 



dv gY _ d da>(q Q ) _ d do)(q 0 )dq Q _ d 2 a>(q Q ) dq Q _ 1 d' 



< 2 a> 

dt dt dq 0 dq 0 dq 0 dt dq\ dt h dq z 



F(t). (2.238) 




This means that the second derivative of the dispersion relation plays the role of the effective reciprocal 
mass of the particle: 

(2.239) 

For the particular case of a free particle, described by Eq. (216), this expression is reduced to the 
original (and constant) mass m, but generally the effective mass depends on the wave packet's 
momentum. According to Eq. (239), at the bottom of any energy band, m e f is always positive, but 
depends on the strength of particle's interaction with the periodic potential. In particular, according to 
Eq. (206), in the tight binding limit, the effective mass is very large: 



Effective 



67 For a Gaussian packet, the spreading is described by Eq. (39), with the replacement k — » q; it is curious that at 
the inflection points with d 2 coldq 2 = 0 (which are present in each energy band) the packet does not spread. 

68 The proof of Eq. (237) is not difficult, but becomes more compact in the bra-ket formalism, to be discussed in 
Chapters 4 and 5. This is why I recommend the proof to the reader as an exercise after reading those two chapters. 
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h 2 E (l) 

\m„ f \ , i s = 7r = m — - — » m . (2.240) 

I *\q={nla)n 2S y ^ V ; 

On the contrary, in the weak potential limit, the effective mass is close to m at most points of each 
energy band, but at the edges of the (narrow) bandgaps it is much smaller. Indeed, expanding Eq. (224) 
in the Taylor series near point q = q m , we get 



±U\±- 



1 (dE,\ w2 _ AlrrU P 



2 



W) I " 2|C/.ll dq 



" i v "'" ! ^q=q , 



r=±\U n \±^—q\ (2.241) 



where /?and q are defined by Eq. (225), so that 



\ m \ =\u \ =m ±JlL <<m . (2.242) 
1 ef| <?=<7 m I "1^2 2E (n) 

The effective mass effects in real solids may be very significant. For example, the charge carriers 
in the ubiquitous field-effect transistors of silicon integrated circuits have m e f « 0.19 m e in the lowest 
normally-empty energy band (traditionally called the conduction band), and m e f ~ 0.98 m e in the lower, 
normally-filled valence band. In some semiconducting compounds the conduction-band electron mass 
may be even smaller - down to 0.0145 m e in InSb! 

However, the absolute value of the effective mass in not the most surprising effect. The more 
shocking corollary of Eq. (239) is that on the top of each energy band the effective mass is negative - 
please revisit Figs. 26, 28, and 29 again. This means that the particle (or more strictly its wave packet's 
envelope) is accelerated in the direction opposite to the force. This is exactly what electronic engineers, 
working with electrons in semiconductors, call holes, characterizing them by positive mass and positive 
charge. If the particle does not leave a close vicinity of the energy band's top (say, due to scattering 
effects), such flip of signs does not lead to an error, because the Lorentz force is proportional to 
electron's charge (q = -e), so that particle's acceleration a gr is proportional to ratio (q/m et ). 69 

However, at some phenomena the usual image of a hole as a particle with q > 0 and m ei > 0 is 
unacceptable. For example, let us form a narrow wave packet at the bottom of the lowest energy band, 70 
and then exert on it a constant force F > 0 - say, due to a constant external electric field directed along 
axis x. According to Eq. (237), this would lead to a linear growth of qo in time, so that in the quasi- 
momentum space, the packet's center would slide, with constant speed, along the q axis - see Fig. 32a. 
Close to the energy band bottom, this motion would correspond to a positive effective mass (possibly, 
somewhat larger than the genuine particle's mass m), and hence be close to free particle's acceleration. 

2 2 

However, as soon as qo has reached the inflection point, where d E\ldq = 0, the effective mass, and 
hence acceleration (238) change signs to negative, i.e. the packet starts to slow down (in the direct space 



69 The language is which the hole has a positive charge and mass has an additional convenience for states on the 
top of the valence band whose single-particle states are normally filled. Then the simplest, single -particle 
excitation of this multi-particle ground state may be created by giving one electron enough energy to lift it to a 
reference (e.g., Fermi-energy) level E F that is, by definition of the valence band, is higher than all values E_(q). 
Then it is natural to prescribe to the excitation a positive mass m tf , because the energy AJE = E F - E_(q) necessary 
for the excitation grows with the deviation of q from q m . 

70 Intuition tells us (and statistical physics duly confirms :-) that this may be readily done, for example, by weakly 
coupling the system to a low-temperature environment, and letting it to relax to the lowest possible energy. 
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x) while still moving ahead in the quasi-momentum space. Finally, at the energy band's top the particle 
stops at certain x max , while continuing to move in the g-space. 




x = AEi IF 

max 1 

Fig. 2.32. Fhe Bloch oscillations (red lines) and the Landau-Zener tunneling (blue arrows) within: 
(a) the time-domain picture, and (b) the energy-domain picture. On panel (b), the tilted gray strips 
show the allowed energy bands, and the bold red lines, the Wannier-Stark ladder. 



Now we have two alternative ways to look at the further time evolution of the wave packet. 
From the extended zone picture (which is the simplest for this analysis, see Fig. 32a), 71 we may say that 
the particle crosses the 1 st Brillouin zone boundary and starts going forward in q, i.e. down the lowest 
energy band. According to Eq. (235), this region (up to the next inflection point) corresponds to a 
negative group velocity. After qo has reached the next minimum of the energy band at qa = 2n, the 
whole process repeats again (and again, and again). 

These are the famous Bloch oscillations - the effect that was predicted (by the same F. Bloch) as 
early as in 1929, but evaded experimental observation until the 1980s - see below. Their time period 
may be readily found from Eq. (237): 

Aq In: I a 2nh 



At u 



dql dt F In Fa 



so that the Bloch oscillation frequency 
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~~h~' 



(2.243) 



Bloch 

(2.244) oscillations: 
frequency 



The direct-space motion of the wave packet's center xo(0 during the Bloch oscillation process 
may be analyzed by integrating Eq. (235) over some time interval At: 



71 This phenomenon may be also discussed from the point of view of the reduced zone picture, but then it 
requires the introduction of instant jumps between the Brillouin zone boundary points (see the dashed red line in 
Fig. 32) that correspond to physically equivalent states of the particle. Evidently, this language is more artificial. 
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Bloch 
oscillations: 
spatial 
swing 
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(2.245) 



If interval At is equal to the Bloch oscillation period Afe (234), the initial and final moments of E(qo) = 
hco(q 0 ) are equal, giving Ax 0 = 0: in the end of the period, the wave packet returns to its initial position. 
However, if we carry this integration only from the smallest to the largest values of co(qo), i.e. the points 
where the group velocity vanishes, we get the oscillation swing 



Ax„ 



= -K 

F 



CO, 



lin / 



F 



(2.246) 



This simple result may interpreted using an alternative energy diagram (Fig. 32b) that results 
from the following arguments. The additional force F may be described not only via the 2 nd Newton law 
version (237), but, alternatively, by its contribution U F = - Fx to the total ("Gibbs" 72 ) potential energy 



Uz(x) = U(x)-Fx 



(2.247) 



of the system. The direct solution of the Schrodinger equation (61) with such potential may be hard to 
find, but if the force is weak in the sense of Eq. (236), as we are assuming now, one can argue that our 
quantum-mechanical treatment including the periodic potential U(x) should be still correct, if the second 
term in Eq. (247) is considered as a constant at the wave packet width scale Sx, but dependent on 
position xo of the packet's center. In this approximation, the total energy of the wave packet may be 
found as 



=E(q 0 )-Fx 0 



(2.248) 



In a plot of such energy as a function of xo (Fig. 32b), the information on energy dependence on 
qo is lost, but we already know it is rather uneventful, and well characterized by the position of band-gap 
edges on the energy axis. 73 In this representation, the Bloch oscillations of a relatively wide (Sx » a) 
wave packet should keep the full energy Ej, constant, i.e. follow a horizontal line in Fig. 32b, limited by 
the classical turning points corresponding to the bottom and the top of the allowed energy band. The 
distance Ax max between these point is evidently given by Eq. (246). 

Besides this second look at the oscillation swing result, the total energy diagram shown in Fig. 
32b enables one more remarkable result. Let a wave packet be so narrow in the momentum space (Sq — > 
0) that \lq » Ax max ; then the horizontal line segment in Fig. 32b presents the spatial extension of the 
eigenfunction of the Schrodinger equation with potential (247). But this equation is evidently invariant 
with respect to the following simultaneous translation in coordinate and energy: 



x + a, E^E-Fa, 



(2.249) 



Wannier- 
Stark 



This means that it is satisfied with an infinite set of similar solutions, each corresponding to one of the 
ladder horizontal red lines shown in Fig. 32b. This is the famous Wannier-Stark ladder, with the step height 

(2.250) 



AE S = Fa 



72 See, e.g., CM Sec. 1.5. 

73 In semiconductor device physics and engineering, such plots are called the band edge diagrams, and are the 
virtually unavoidable components of any discussion or publication. 
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The importance of this alternative representation of the Bloch oscillations is due to the following 
fact. In most experimental realizations, the power of radiation at frequency (244), that may be extracted 
from the oscillations by their electromagnetic coupling to an external detector, is very low, so that their 
direct detection presents a hard problem. 74 However, let us apply to a Bloch oscillator an additional rf 
field at frequency a> ~ cob. As these frequencies are brought close together, the external signal should 
synchronize ("phase lock") Bloch oscillations, 75 resulting in certain observable changes - for example, a 
resonant absorption of the external radiation. Now let us notice that Eqs. (244) and (250) yield the 
following remarkable relation: 

AE s =hco B . (2.251) 

This means that the resonant phenomena at a> « q)b allow for an alternative (but equivalent) 
interpretation - as the result of rf-induced transitions 76 between the steps of the Wannier-Stark ladder! 
(Such occasions when two very different languages may be used for the interpretation of the same 
phenomenon is one of the most beautiful features of physics.) 

This effect has been used for the first experimental confirmation of the Bloch oscillation theory. 
For this purpose, the natural periodic structures, solid state crystals, are inconvenient due to their very 
small period a ~ 10" 10 m. Indeed, according to Eq. (244), such structures require very high forces F (and 
hence high electric fields 3 = Fie) to bring <x>q to an experimentally convenient range. This problem has 
been overcome by fabricating artificial periodic structures (superlattices) of certain semiconductor 
compounds, such as Gai. x ALAs with various degrees x of gallium to aluminum atom replacement, 
whose layers may be grown over each other epitaxially, i.e., without very few crystal structure 
violations. These superlattices, with periods a ~ 10 nm, has allowed a clear observation of resonant 
effects at a> « <x>q, and hence the measurement of the Bloch oscillation frequency, in particular its 
proportionality to the applied dc electric field, predicted by Eq. (244). 77 

Very soon after this observation, the Bloch oscillations have been observed in small Josephson 
junctions. 78 Since this experiment involved two important conceptual issues, let me discuss it in a little 
bit more detail. As was discussed in Sec. 2.3, the Josephson junction dynamics may be reasonably well 
described by two simple equations (54) and (55). They may be combined to calculate the work of an 
external voltage source at Josephson phase change between arbitrary initial {q> m \) and final ((pr m ) values, 
as the integral of its power IV over the time interval At of the change: 

work = j" IVdt = j" (l c sin <p) ^- ^ dt = J sin <pd<p = - ^ (cos <p rm - cos <p M ) . (2.252) 



2e J 2e 



At At Ul J ^ (p { 

We see that the work depends only on the initial and final values of q> (but not on the law phase 
evolution in time), i.e. may be presented as the difference U{(f>f m ) - U((pm\), where function 



74 In systems with many independent particles (such as semiconductors), the detection problem is exacerbated by 
phase incoherence of the Bloch oscillations performed by each particle. This drawback is absent in atomic Bose- 
Einstein condensates whose Bloch oscillations (in a periodic potential created by standing optical waves) were 
eventually observed by M. Ben Dahan et ah, Phys. Rev. Lett. 76, 4508 (1996). 

75 A simple analysis of phase locking of a classical oscillator may be found, e.g., in CM Sec. 4.4. 

76 A qualitative theory of such transitions will be discussed in Sec. 6.6 and then in Chapter 7. 

77 E. E. Mendez et ah, Phys. Lev. Lett. 60, 2426 (1988). 

78 L. S. Kuzmin and D. Haviland, Phys. Rev. Lett. 67, 2890 (1991). 
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Josephson 
coupling 
energy 



U(<p) = -E J cos <p + const, with E s = 



2e 



(2.250) 



may be interpreted as the potential energy of the junction (if we consider the Josephson phase as a 
generalized coordinate). This energy apart, the Josephson junction, as a system of two close, nearly 
isolated (superconductors, has a certain capacitance C and the associated electrostatic energy Eq = 
CV 2 I2. Using Eq. (54) again, we may present it as 



2 



( t, \ 



C 



\2ej 



dcp 
dt 



(2.251) 



This means that from the point of view at phase cp as a generalized coordinate, Ec should be considered 
the kinetic energy of the system, whose dependence on the generalized velocity dcpldt is similar to that 
of a ID mechanical particle, with an effective mass 79 



in. 



= C 



\lej 



(2.252) 



Hence the total energy of the junction, Ec + U(<p), is formally similar to that of a ID nonrelativistic 
particle in the sinusoidal potential with the #>-axis period a; = 2n. 

However, before using the results of the ID band theory to this system, we have to resolve one 
paradox (that was the subject of a lively discussion just about 30 years ago). When we develop the band 
theory, we imply that its translation by period a is (in principle) measurable, i.e. particle positions x and 
(x + a) are distinguishable - otherwise Eq. (193) with q * 0 would not have much sense. For a 
mechanical particle this assumption is very plausible, but less so for a Josephson junction. Indeed, for 
example, if we change q> by a } = 2n via changing the phase of one of superconductors, say q>\ (Fig. 3) 
by 2n, then its wavefunction becomes I y/ \ exp{z'(^i + In)} = 1^1 exp{z'^i}, and it is not immediately 
clear whether these two states may be distinguished. In order to resolve this contradiction, it is sufficient 
to have a look at Eq. (54). It shows that if <p changes in time by 2n (say, by a fast ramp-up), voltage V 
across the junction exhibits a pulse with "area" 



SFQ 
pulse 



fn/^ = -f^* = -f^ = -2»r = --2xlO- M V-s. 
J 2e J J* lei 



2e- 



2e 



2e 



(2.253) 



Such single-flux-quantum (SFQ) pulses 80 not only may be measured experimentally, but even have been 
used for signaling and ultrafast (sub-THz) computation, to the best of my knowledge still keeping the 
absolute records for the highest speed and smallest energy consumption at computation. 81 

Hence, the 2;r-shifts of phase (p are measurable, and in the absence of dissipation the Josephson 
junction dynamics is indeed similar to that of a ID particle in a periodic (sinusoidal) potential, and its 
energy spectrum forms energy bands and gaps described by the Mathieu equation - see Fig. 31. 
Experimentally, the easiest way to verify this picture is to measure the corresponding Bloch oscillations 



79 Of course, the dimensionality of m e{ so defined is different from kg. 

80 This term has originated from the fact that the right-hand part of Eq. (253) equals to the single quantum unit 
(®o) of the magnetic flux in superconductors - see Sec. 3.1 below. 

81 See, e.g., P. Bunyk et al., Int. J. on High Speed Electronics and Systems 11, 257 (2001). 
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induced by an external current 7 ex (0- I n order to find the frequency of these oscillations, it is sufficient to 



replace Eq. (237), which expresses the 2 
the charge balance equation 



nd 



Newton law averaged over period a of potential U(x), with 



dQ 
dt 



(2.254) 



where Q is the "quasi-charge" 82 , i.e. the electric charge of the capacitor averaged over the period In of 
the periodic potential U(cp). (Notice that at such averaging, current (55) is averaged out from the 
equation, so that is affects the phenomena "only" via its contribution to the energy band structure.) 

Since the Josephson-junction analog of the genuine wave number k = m(dxldt)lh of a particle is 



h 



dcp 
dt 



m s le 
~h~Y 
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cv 

~le 



(2.255) 



and CV is the genuine charge on the capacitor, the analog of q (the quasi-momentum divided by ti) may 
be obtained just by the replacement of that product with quasi-charge Q: 



Q_ 

le 



(2.256) 



Comparing this expression with Eq. (254), we see that q } obeys the following equation of motion: 



dt 



le 



so that the role of force F is now played by Fj = Mile. Hence if 7 ex (0 
with that replacement, and also a — > aj = In, to get 




(2.257) 



const = 7 , we can use Eq. (244) 



(2.258) 



This very simple result has the following physical sense. 83 In the quantum operation mode, the 
junction is recharged by the external current, following Eq. (256), until its electric charge reaches e (i.e. 
qjaj = (Qlle)ln reaches n- see Fig. 32a); then one Cooper pair passes through the junction changing its 
charge to e - (le) = -e, with the same charging energy (251) - the process analogous to crossing the 
border of the 1 st Brillouin zone; then the process repeats again and again. It is remarkable that Eq. (258), 
describing the frequency of such quantum property of the Josephson phase q> as its Bloch oscillations, 
does not include the Planck constant, while Eq. (56), describing the classical motion of (p, does. 

In this context, one may wonder which of these two types of oscillations would a dc-biased 
Josephson junction generate. For the dissipation-free junction, the answer is: the Bloch oscillations 
(258) with frequency proportional to dc current. However, any practical junction has some energy losses 
that may be (approximately) described by a certain Ohmic conductance G connected in parallel to the 



Bloch 

oscillations 
in super- 
conductivity 



82 Eq. (254) tells us that quasi-charge Q has the simple physical sense of the external electric charge being 
inserted into the junction by the external current 7 ex - just like the physical sense of quasi-momentum fiq of a 
mechanical particle, according to Eq. (237), is the contribution to particle's momentum by the external force F. 

83 D. V. Averin et al, Sov. Phys. - JETP 61, 407 (1985). 
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junction. Very luckily for Dr. Josephson and his Nobel Prize, it is much easier to fabricate and test 
junctions with G » 1/ Rq, where Rq is the so-called quantum unit of resistance 

Quantum 
unit of 
resistance 

the fundamental constant that jumps out at analysis of several other effects as well - see, e.g., Sec. 3.2. 
As will be discussed in Chapter 7, such high energy losses provide what is called dephasing - the 
suppression of the quantum coherence between different quantum states of the system - in our current 
case, between the wavefunctions u((p - 2/g) localized at different minima of the periodic potential U((p), 
and thus make the dynamics of the Josephson phase q> virtually classical, obeying equations (54) and 
(55). As we have seen in Sec. 2, dc biasing of such a junction leads to Josephson oscillations with 
frequency (56) proportional to the applied dc voltage. 



R, 



7ih 
2e> 



6.45 kQ. 



(2.259) 



2.9. Landau-Zener tunneling 

All the Bloch oscillation discussion in the last section was based on the premise that the particle 
stays within one (say, the lowest) energy band. However, just a single look at Fig. 32 shows that this 
assumption becomes unrealistic if the energy gap separating this band from the next one becomes very 
small, Ai — > 0. Indeed, in the weak potential approximation, that is adequate in this limit, at I U\\ — » 0, 
the two dispersion curve branches (216) cross without any interaction, so that if our particle (the wave 
packet) is driven to approach that point, it should continue to move up in energy - see the dashed blue 
arrow in Fig. 32a. Similarly, in the "energy-domain" presentation shown in Fig. 32b, it is intuitively 
clear that at Ai — > 0, the particle residing at one of the steps of the Wannier-Stark ladder should able to 
somehow overcome the vanishing spatial gap Axo = Ai/F and to leak into the next band - see the 
horizontal dashed blue arrow on that panel. 

This process, called the Landau-Zener (or "interband", or "band-to-band") tunneling^ is indeed 
possible. In order to analyze it, let us first take F = 0, and consider what happens if a quantum particle 
described by an x-long (i.e. ^-narrow) wave packet is incident from the free space upon a periodic 
structure of a large but finite length / » a. If packet's energy E is within one of the energy bands, it 
may evidently propagate through the structure (though may be partly reflected from its front end). The 
corresponding quasi-momentum may be found by solving the dispersion relation for q; for example, in 
the weak-potential limit, Eq. (224), which is valid near the gap, yields 



' 2 , where £ = £ ± -£ (n) , (2.260) 



~ ~ i r~i i i2 

q = q,„ +q, q = +-[e -\u n \ 
r 

and /is given by the second of Eqs. (225). 

Now, if energy E corresponds to one of the energy gaps A„, the propagation is impossible, so that 
the packet is completely reflected back. However, our analysis of the potential step problem in Sec. 3 
implies that the wavefunction would still have an exponential tail protruding into the periodic structure 
and decaying on some length 8 - see Eq. (67). Indeed, a review of the calculation leading to Eq. (260) 



84 It was predicted independently by L. D. Landau, Phys. Z. Sowjetunion 2, 46 (1932) and C. Zener, Proc. R. Soc. 
London A 137, 696(1932). 
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shows that they remain valid within the gap as well, if the quasi-momentum is understood as a purely 
imaginary number: 



1 li 12 — 

q — > ±irc, where k = —\U\ -E" 

r 



12 



forE 2 < U., 



(2.261) 



With such contribution, the Bloch solution (193b) indeed describes an exponential decay of the 
wavefunction at length 8= \Ik. 

Now returning to the effects of weak force F in the energy-domain approach, presented by Eq. 
(248) and illustrated in Fig. 32b, we may recast Eq. (261) as 



k — > k(x) = 



1/2 



(2.262) 



where x is particle's (i.e. wave packet center's) deviation from the mid-gap point. Thus the gap has 
created a potential barrier of a finite width Axo = 2Fl\ U„\ , through which the wave packet may tunnel 
with a finite probability. As we already know, in the WKB approximation (in our case requiring kAxo 
» 1) this probability is just the tunnel barrier's transparency T, which may be calculated from Eq. 
(117): 



lnr = 2 \ K{x)dx = - | \u„ | 2 - (Fx) 2 J 11 dx = ^^2x c |(l - £ 2 J' 2 d% . 

k(x) 2 >o I -x e 
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(2.263) 



where +x c = ±Axo/2 = ±F/\ U n \ are the classical turning points. Working out this simple integral (which 
may be viewed upon as the quarter of the unit circle's area, and hence equal to 7r/4), we get 



T = exp< 




2 ' 







Landau- 

(2.264) tunneling 
probability 



This famous result was obtained by Landau and Zener in a more complex way, whose advantage 
is a constructive proof that Eq. (264) is valid for arbitrary relation between yF and I U n \ 2 , i.e. arbitrary T, 
while our simple derivation was limited to the WKB approximation, i.e. to T« l. 85 

Returning to Eq. (225) and (237), we can rewrite the product yF participating in Eq. (264) as 



yF = 



1 



d{E,-E v ) 



dq 0 
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d{E,-E r ) 



dt 



hu 
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(2.265) 



where u has the meaning of the "speed" of the energy level crossing in the absence of the gap. Hence, 
Eq. (264) may be presented in a form 



T = exp< 



Tm 



(2.266) 



85 Note that Eq. (264) is still limited to the hyperbolic dispersion relation, i.e. (in the band theory) to the weak 
potential limit. In the opposite, tight-binding limit, the interband tunneling may be treated as an excitation of the 
upper band states by sinusoidal Bloch oscillations, and is completely suppressed at #co B < Ai. 
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that is more physically transparent. 86 Indeed, the fraction 2| U n \/u = A n u gives the time scale A? of 
energy's crossing the gap region, and according to the Fourier transform, its reciprocal, Ob ax ~ 
gives the upper cutoff of frequencies involved in the Bloch oscillation process. Hence Eq. (266) means 
that 

-In 7*^^. (2.267) 

^max 

This formula allows us to interpret the Landau-Zener tunneling as for system's excitation across the 
energy gap A n , by the maximum energy quantum #<xw x available from the Bloch oscillation process. 

The interband tunneling is an important ingredient of several physical phenomena and even some 
practical devices, for example the tunneling (or "Esaki") diodes. This simple device is just a junction of 
two semiconductor electrodes, one of them is so strongly n-doped by electron donors that the additional 
electrons form a degenerate Fermi gas at the bottom of the conduction band. Similarly, the opposite 
electrode is /?-doped so strongly that the Fermi level of electrons in the valence band is lowered below 
the band edge (Fig. 33). 




Fig. 2.33. Tunneling diode: (a) the band edge diagram of the device at zero bias; (b) the same diagram at 
modest positive bias eV~ A/2, and (c) the I-V curve (schematically). Dashed lines show the Fermi level 
positions. 



At thermal equilibrium, and in the absence of external voltage bias, the Fermi levels self-align, 87 
leading to the build-up of the contact potential difference <f>le, with <j) somewhat larger than the energy 
bandgap A - see Fig. 33a. This potential difference creates an internal electric field that tilts the energy 
bands (just as the external field did in Fig. 32b), and leads to the formation of the so-called deletion 
layer in which the Fermi level located is within the energy gap and hence there are no charge carriers 
ready to move. In usual p-n junctions, this layer is broad and prevents any current at applied voltages V 
lower than ~A/e . In contrast, in a tunneling diode the depletion layer is so thin (below ~10 nm) that the 
interband tunneling is possible and provides a substantial Ohmic current at small applied voltages - see 
Fig. 33c. 

However, at substantial positive bias, eV ~ A/2, the conduction band become aligned with the 
middle of the gap in the /7-doped electrode, and electrons cannot tunnel there. Similarly, these are no 



86 In Chapter 6, Eq. (266) will be derived using a different method based on the Golden Rule of quantum 
mechanics. 

87 See, e.g., SM Sees. 1.5 and 6.4. 
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electrons in the «-doped semiconductor to tunnel into the available states just above the Fermi level in 
the /?-doped electrode - see Fig. 33b. As a result, current drops significantly, to grow again only when 
e V exceeds ~A and allows the electron motion through the within each energy band. Thus the tunnel 
junction's I-V curve has a part with negative differential resistance {dVldl < 0). This effect may be used 
for the amplification of analog signals, including self-excitation of electrical oscillators (i.e. rf signal 
generation), 88 and signal swing restoration in digital electronics. 



2.10. Harmonic oscillator: A brute force approach 

To complete our review of ID systems, we have to consider the famous harmonic oscillator, i.e. 
a ID particle moving in the quadratic-parabolic potential (111). This is just a smooth quantum well 
providing "soft" confinement, whose discrete spectrum we have already found in the WKB 
approximation - see Eq. (114). Let us try to solve the same problem exactly - not because there is 
anything conceptually interesting in it (there is not :-), but because of its enormous importance for 
applications. For that, let us write the stationary Schrodinger equation for potential (1 1 1): 



h 1 d 2 y/ mo) < 



2m dx z 



0 x> 



Ey/ 



(2.268) 



From the solution of Exercise Problem 1.5, the reader already knows 89 one of the eigenfunctions of this 
equation, 



Vo=C 0 exp< 


mco 0 x 2 




. 2* j 



(2.269) 



and the corresponding eigenenergy 



hco n 



(2.270) 



Expression (269) shows that the characteristic scale of wavefunction's spatial spread is equal to 
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(2.271) 
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Due to the importance of this scale, let us give its crude estimates for several typical systems: 



(i) Electrons in solids and fluids: m « 10" 30 kg, coo ~ 10° s" 1 , giving x 0 ~ 0.3 nm, comparable 
with inter-atomic distances a. As a result, classical mechanics is not valid at all for the analysis of their 
motion. 



»15 „-i 



(ii) Atoms in solids: m « 10" 24 -10" 26 kg, a>o 



10 13 s" 1 , giving x 0 



0.01 - 0.1 nm, i.e from ~a few 
percent to a few tens percent of a. Because of that, methods based classical mechanics (e.g., molecular 
dynamics) are approximately valid for the analysis of atomic motion, though may miss some fine effects 



88 See, e.g., CM Sec. 4.4. 

89 If not yet, I am inviting him or her to check this fact now by the direct substitution of solution (269) into the 
differential equation (268), simultaneously proving Eq. (270). 



Chapter 2 



Page 66 of 72 



Essential Graduate Physics 



QM: Quantum Mechanics 



of motion of lighter atoms - e.g., quantum tunneling of hydrogen atoms through energy barriers of the 
potential profile created by its neighbors. 

2 2 1 19 

(iii) LIGO 90 probe masses: m -10 kg, a>o ~ 10 s" , giving x 0 ~ 10" m. After two decades of 
work (and hundreds of millions of NSF dollars :-), this experiment is still struggling with seismic noise 
at the level of a few 10" 17 m, and is presently being upgraded to the next version called "Advanced 
LIGO" with the goal to decrease the noise by only an order of magnitude. Thus the prospects of 
observing quantum-mechanical effects, much heralded at the initial planning of these instruments, still 
do not look very realistic. 

Returning to the Schrodinger equation (268), let us recast it into a dimensionless form by 
introducing dimensionless variable ^ = xIxq. This gives 



d y/ 



+ i; 2 y/ = sy/, (2.272) 



where s = IEIHcoq = E/Eq. In this notation, the ground state wavefunction is proportional to exp{-£ /2}, 
so that let us look for the solutions to Eq. (272) in the form 

^ = Cexpj-^j//(£), (2.273) 

where H(%) is a new function. With this substitution, Eq. (272) yields 

d-H 2£—+{e-\)H = 0. {221 A) 



It is evident that H = const and e = 1 is one of its solutions, describing the eigenstate (269) with 
energy (270), but what are the other eigenstates and eigenvalues? This equation has been studied in 
detail in the mid- 1800s by C. Hermite who has shown that all eigenvalues are given by equation 

e n -l = 2n, with n = 0,1,2,..., (2.275) 

so that our WKB result (114) is indeed exact for any n, and Eqs. (269) and (270) describe the ground- 
state of the oscillator. The eigenfunction corresponding to eigenvalue s„ is a polynomial (now called the 
Hermite polynomial) of degree n, that may be most conveniently calculated using the following explicit 
formula: 

polynomials H n = (- 1)" exp{^ 2 }-^-exp{- (2.276) 

It is easy to use this formula to calculate several lowest-degree polynomials - see Fig. 34a: 

H 0 =l, H x =2%, H 2 =4f-2, // 3 =8£ 3 -12£, H 4 =16£ 4 -48£ 2 +12,... (2.277) 

The most important properties of the polynomials are as follows: 

(i) their "parity" (symmetry-antisymmetry) alternates with number n, 

(ii) H„(%) crosses the £-axis exactly n times (has n zeros), and 



90 LIGO = Laser Interferometer Gravitational- Wave Observatories, see online at http://www.ligo.caltech.edu/ . 
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(iii) the polynomials are mutually orthonormal in the following sense: 

J H n &H n , (£) exp{- e }cU; = 7t xn V n\S ntn , . 



(2.278) 



Using Eq. (273) to translate this result to functions y/ n (x), we get the following orthonormal 
eigenfunctions of the harmonic oscillator (Fig. 34b): 91 




Harmonic 
oscillator's 
(2.279) eigen- 

functions 



1U 



n = 0 


n = 2 














\ J n ~ 


= 3 



(a) 




Fig. 2.34. (a) A few lowest Hermite 
polynomials and (b) the corresponding 
eigenenergies (dashed lines) and 
eigenfunctions (solid lines) of the 
harmonic oscillator. Fhe black dashed 
line shows the potential profile U(x), 
drawn on the same scale as energies E„, 
so that the line crossings with the energy 
levels correspond to the classical turning 
points. 



Besides its own importance, this is a typical example of eigenstates of particle confined in a soft- 
wall quantum well. It is very instructive to compare them with eigenstates of a the rectangular quantum 
well, with its ultimately-hard walls - see Eq. (1.76) and Fig. 1.7. Let us list their similar features: 



91 These stationary states of the harmonic oscillator are sometimes called its Fock states, to distinguish them from 
other fundamental solutions (such as Glauber states) which will be discussed in Sec. 5.5 and beyond.. 
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(i) Wavefunctions oscillate in the classically-allowed regions with E„ > U(x) while 
dropping exponentially beyond the boundaries of that region. 

(ii) The symmetric and antisymmetric wave functions alternate as index n is increased. 

Here are the major features specific for the soft confinement: 

(i) The spatial spread of the wavefunction grows with n, following the gradual increase of 
the classically allowed region. 

(ii) Correspondingly, E n exhibits a slower growth than the E n <x n law given by Eq. 
(1.77), because of the gradual reduction of quantum confinement, which moderates the growth of kinetic 
energy. 

Unfortunately, this brute force approach to the harmonic oscillator problem is not too appealing 
intellectually. First, the proof of Eq. (276) is rather longish. More importantly, it is hard to use Eq. (279) 
for calculation of the so-called matrix elements of the system - as we will see in Chapter 4, virtually the 
only numbers important for applications. Finally, it is also almost evident that there should be some 
straightforward math leading any formula as simple as Eq. (114) for E„. This is why in Sec. 5.4, I will 
describe a much more efficient, operator-based approach to this problem. 



2.11. Exercise problems 

2.1 . Use Eq. (5) to calculate the electric conductance of a narrow, uniform conducting channel 
between two bulk conductors, in the low-voltage and low-temperature limit, neglecting electron 
interaction and scattering inside the channel. 

2.2 . (i) Calculate the probability current I(x,t) carried by the ID Gaussian wave packet described 

2 2 

by Eq. (37), with d coldk = film: 



^(xj) = cexp< - 



4A(0 



■ + i 



JcqX 



2m 



where 



hk n 



m 



and A(0 = (&Y + 



2 itit 



2m 



(ii) Calculate integral 



^I(x,t)dx 



and discuss its time dependence (if any). 



2.3 . Express the ID propagator, as defined by Eq. (44), via eigenfunctions and eigenenergies of a 
particle moving in an arbitrary stationary potential U(x). (For the notation simplicity, assume that the 
energy spectrum of the system is discrete.) 
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2.4 . Analyze the effect of phase locking of Josephson oscillations on the dc current flowing 
through the junction, assuming that external microwave source applies a fixed sinusoidal ac voltage, 

V{t)-V =Acoscot, 

to a junction with sinusoidal current-phase relation (55), using Eq. (54) for evolution of phase <p. 

2.5 . Calculate the transmission coefficient T as a function of particle energy E for the rectangular 
potential barrier, 



for the case E > Uo. Analyze and interpret the results, taking into account that Uo may be either positive 
or negative. (In the last case, we are speaking about particle's passage over a rectangular potential well 
of finite depth.) 

2.6 . Use the quasi-classical (WKB) approximation to find the energy spectrum of the triangular 
quantum well 92 



Compare the WKB positions of the lowest 3 energy levels, and also of the 10 level, with the exact 
solution of the same problem. 93 

2.7 . Use the quasi-classical (WKB) approximation to calculate transparency T as a function of 
particle energy E, for the potential barrier in the form of the inverted harmonic potential 



Analyze the result; in particular, compare it with the exact Kemble formula (119). 

2.8 . Prove that the symmetry of the scattering matrix elements describing an arbitrary time- 
independent scatterer allows to present it in the form (136a), with the additional restriction (136b). 

2.9 . For a deep and narrow ID quantum well, modeled by a delta- function, 



92 For F = mg, this is just the famous bouncing ball problem. 

93 The necessary values of the first zeros of the Airy function may be found in one of many math handbooks, for 
example, in Table 10. 13 on the collection edited by Abramowitz and Stegun - see MA 16(i). 



U(x) = \ 



0, for x < -d 1 2, 

U 0 , for-d/2<x<+d/2, 
0, ford/2 <x, 




+ oo, forx<0, 
Fx, for x > 0, 



with F > 0. O 



Chapter 2 



Page 70 of 72 



Essential Graduate Physics 



QM: Quantum Mechanics 



U(x) = -W8(x), with^X), 



(*) 



find the "localized" eigenfunction(s) y/ n (with \y/ n (x)\ — » 0 at Ijcl — > co ), and the corresponding value(s) 



2.10 . Analyze the localized eigenfunction(s) and the characteristic equation(s) for eigenenergies 
of a ID particle in the following two-well potential 



U(x) = -W 



8 



+ S 



x + - 



, with W> 0. 



Explore asymptotic behaviors of the eigenenergies in the limits of very strong and very weak potential, 
and find the number of localized states as a function of distance a. 



2.11 . Calculate the transmission coefficient of the following ID scatterer: 

U(x) = W x 8{x) + W 2 S(x - a) , 

as a function of particle's energy, and find its maximum value. (Each of W\ y2 may be either positive or 
negative.) Does T change if the well/barrier positions are swapped? 

2.12 . At t = 0, a ID particle of mass m was placed into the (metastable) ground state of the 
"pocket" of the potential profile 

U(x) = Ax 3 -Fx, 

with AF > 0. Use the WKB approximation for the state's lifetime. Estimate result's accuracy. 

2.13 . Prove Eq. (191), starting from Eq. (190). 



2.14 . Calculate the whole transfer matrix of the rectangular tunnel barrier, specified by Eq. (76) 
of the lecture notes, for particle energies both below and above Uq. 



2.15 . Use results of the previous problem to U(x) * 
calculate the transfer matrix of one period of the periodic 
Kronig-Penney potential shown in Fig. 30b (reproduced on U 0 
the right). Verify your result by considering the limit of a 
very short and high tunnel barrier, and comparing it with 
the results for the ^-functional barrier, derived in class. Q 



d 



a 



2.16 . Using results of the previous problem, derive the characteristic equations for particle 
motion in the periodic Kronig-Penney potential, for both E < Uo and E > U 0 . Try to bring the equations 
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to a form similar to that obtained in Sec. 2.5 for the delta-functional barriers - see Eq. (166). Use the 
equations to formulate the conditions of applicability of the tight-binding and weak potential 
approximations, in terms of the initial parameters of the problem (Uq, d, a, and the particle's mass m) 
and particle's energy E. 

2.17 . Find and analyze the characteristic equation for eigenvalues for a particle in a rectangular 
well of a finite depth: 



In particular, find the number of localized states as a function of well's width a. 

2.18 . For the same Kronig-Penney potential, use the tight binding approximation to calculate the 
widths of the allowed energy bands. Compare the results with those of Problem 16 (in the corresponding 
limit). 

2.19 . For the same Kronig-Penney potential, use the weak potential limit formulas to calculate 
the energy gap widths. Again, compare the results with those of Problem 16 (in the corresponding limit). 

2.20 . A ID harmonic oscillator (with mass m and frequency ooq) had been in its ground state; 
then an additional force F was suddenly applied (and retained constant in time). Find the probability of 
the oscillator staying in its ground state. 

2.21 . Prove the following formula for the propagator of the ID harmonic oscillator: 




\ 



G(x,t;x Q ,t 0 ) = 



mco. 



'0 



imco. 



'o 




)cos[<z> 0 (t - 1 0 )] - 2xx 0 ] 



2mh sin[<z> 0 (t - 1 0 )] 



exp 



2h sm[a> 0 (t -t 0 )] 



V 



Discuss the relation between this formula and the propagator of a free ID particle. 
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Chapter 3. Higher Dimensionality Effects 

The coverage of multi-dimensional problems of wave mechanics in this course is minimal: it is limited 
to a few phenomena {such as the AB effect and Landau levels) that cannot take place in one dimension 
due to topological reasons, and a few key 3D problems {such as the Born approximation in scattering 
theory and the Bohr atom) whose solutions are necessary for numerous applications. 



3.1. Quantum interference and the AB effect 

In the past two chapters, we have already discussed some effects of the de Broglie wave 
interference. For example, standing waves inside a quantum well, or even on the top of a tunnel barrier, 
may be considered as a result of the incident and reflected waves. However, there are some remarkable 
new effects made possible by the spatial separation of such traveling waves, and such separation 
requires a higher (either 2D or 3D) dimensionality. A good example of such separation is provided by 
the Young-type experiment (Fig. 1) in which particles are passed through two narrow holes (or slits) is 
an otherwise opaque partition. 




partition 
with 2 slits 



3D 

Helmholtz 
equation 



If the particles emitted by the source do not interact (which is always true if the emission rate is 
sufficiently low), the average rate of particle counting by the detector is proportional to the probability 
density w{r, t) = ^(r, t) x ¥*(r, t) to find a single particle at the detector's location r, where ^(r, t) is the 
solution of the single-particle Schrodinger equation (1.25). Let us describe this experiment for the case 
when the particles may be represented by monochromatic waves of energy E (e.g., very r-long wave 
packets), so that the wave function may be taken in the form given by Eqs. (1.56) and (1.61): ^(r, t) = 
yAj) Qxp{-iEt/h} . In this case, in the free-space parts of the system, y/{r) satisfies the stationary 
Schrodinger equation (1.60) with Hamiltonian (1.27a): 

h 2 , 

V 2 y/ = Ey/. (3.1a) 

2m 

With the standard definition k = {2mE) lh, it may be rewritten as the 3D Helmholtz equation 



vy+*y = o 



(3.1b) 
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- an evident 3D generalization of Eqs. (1.75) or (2.81). 

The opaque parts of the partition may be well described as classically forbidden regions, so if 
their size scale a is much larger than the wavefunction penetration depth 8 (2.67), we can use on their 
surface S the same boundary conditions as for the quantum barrier of infinite height: 

¥ \ s =0. (3.2) 

Equations (1) and (2) formulate the standard boundary problem of the theory of propagation of 
scalar waves of any nature. For an arbitrary geometry, such problem does not have a simple analytical 
solution. However, for a conceptual discussion of interference we use certain natural assumptions that 
will allow us to find its particular, approximate solution. 

First, let us discuss wave emission, into free space, by a small-size source located at the origin. 
Naturally, the emitted wave should be spherically-symmetric: yAj) = yAf). Using the well-known 
expression for the Laplace operator in spherical coordinates, 1 we then reduce Eq. (1) to an ordinary 
differential equation 



1 d ( , dy/^ 
dr 



r 2 dr 



2 

r 



+ k 2 y/ = 0. (3.3) 



Let us introduce a new function, fir) = ryAf). Plugging the reciprocal relation y/=flr into Eq. (3), we see 
that it is reduced to the ID wave equation, 



^ + k 2 f = 0, (3.4) 
dr 

whose solutions were discussed in detail in Sec. 2.2. For a fixed k, the general solution of Eq. (4) is 



f = f + e ikr + f_e~ ikr (3.5) 



so that the full wavefunction 



¥( r) = I±e ikr +^e- ikr , L Q .v( r j) = ^e i(kr - m K^e- i(kr+6}t \ with « = - = — . (3.6) 
r r r r h 2m 

If the source is located at point r ' ^ 0, the obvious generalization of Eq. (6) 

^ (rj) = f^ e i(kR-oyt) + f^ e -i(kR + cot)^ withi?s | R | ? Rsr _ r '. (3.7) 
R R 

The first term of this solution describes a spherically-symmetric wave propagating from the 
source outward, while the second one, a wave converging onto the source point r ' from large distances. 
Though the latter solution is possible at some very special circumstances (say, when the outgoing wave 
is reflected back from a spherical shell), for our problem, only the outgoing waves are relevant, so that 
we may keep only the first term (proportional to f+) in Eq. (7). Note that factor R is the denominator 
(that was absent in ID geometry) has a simple physical sense: it provides the independence of the full 
probability current / = 4nR 2 j(R), with j(R)>x kW^* cc MR 2 , of the distance R between the observation 
point and the source. 



1 See, e.g., MA Eq. (10.9). 
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Now let us assume that the partition's geometry is not too complicated - for example, it is planar 
as shown in Fig. 1, and consider the region of the particle detector location far behind the partition (at z 
» Ilk), and at a relatively small angle to it: I x I « z. Then it should be physically clear that the 
spherical waves (7) emitted by each point inside the slit cannot be perturbed too much by the opaque 
parts of the partition, and their only role is the restriction of the set of such emitting points by the area of 
the slits. Hence, an approximate solution of the boundary problem is given by the following Huygens 
principle: the wave behind the partition looks as if it was the sum of contributions (7) of point sources 
located in the slits, with each source's strength f+ proportional to the amplitude of the wave arriving at 
this pseudo-source from the real source - see Fig. 1 . This principle finds its confirmation in strict wave 
theory, which shows 2 that with our assumptions, the solution of the boundary problem (l)-(2) may be 
presented as the following Kirchhoff integral: 



Wave- 
function 
superposition 



Kirchhoff 
integral 



^(r') ikR,2, 



slits 



R 



e l ^d z r', with c 



_k_ 
2m 



(3.8) 



If the source is also far from the partition, its wave front is almost parallel to the slit plane, and 
the slits are not too broad, we can take yAj ') constant ( y/\j) at each slit, so that Eq. (8) is reduced to 



y/(r) = a ", Qxp{ikl'\ } + a " 2 Qxp{ikl" 2 }, 



with a "j 2 = 



I" 



1,2 ' 



(3.9) 



1.2 



where ^4^2 are the slit areas. The wavefunctions on the slits be calculated approximately 3 by applying the 
same Eq. (7) to the space before the slits: y/\^ ~ (/+//'i,2)exp{/A:/'i ! 2}- As a result, Eq. (9) may be 
rewritten as 



y/(v) = a l exp^'A^ } + a 2 exp{ikl 2 }, with l l 2 = l\ 2 + l" l 2 , a x . 



1 1,2' 1,2 



(3.10) 



(As Fig. 1 shows, each of is the length of the full classical path of the particle from the source, 
through the corresponding slit, and further to the observation point r - see Fig. 1). 

According to Eq. (10), the resulting rate of particle counting is proportional to 



Quantum 
interference: 
the pattern 
and phase 
shift 



+ 2 



coscp i2 . 



where 



(3.11) 



(3.12) 



is the difference of the total wave phase accumulations along each of two alternative trajectories. The 
last expression may be evidently generalized as 



2 For a proof of Eq. (8), see, e.g., EM Sec. 8.5. 

3 A possible (and reasonable) concern about the application of Eq. (7) to the field in the slits is that it ignores the 
effect of opaque parts of the partition. However, as we know from Chapter 2, the main role of the classically 
forbidden region is providing the reflection of the incident wave towards its source (i.e. to the left in Fig. 1). As a 
result, the contribution of this reflection to the field inside the slits is insignificant is A^ 2 » A 2 , and even in the 
opposite case provides just some rescaling of the probability amplitudes a which is unimportant for our 
conceptual discussion. 
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(3.13) 



with integration along the virtually closed contour C (see the dashed line in Fig. 1), i.e. from point 1, in 
the positive (i.e. counterclockwise) direction to point 2. (From our experience with the ID WKB 
approximation we may expect such generalization to be valid even if k changes, sufficiently slowly, 
along the paths.) 

Our result (11) shows that the counting rate oscillates as a function of the difference (Jz - h) that 
in turn changes with detector's position, giving the famous interference pattern, with the amplitude 
proportional to the product | aiazl , and hence vanishing if any of the slits is closed. For a wave theory, 
this is a well-known result, 4 but for particle physics, is was (and still is :-) rather shocking. Indeed, our 
analysis pertains to a very low particle emission/detection rate, so that there is no other way to interpret 
it rather than resulting from particle's interference with itself, or rather the interference of its 
wavefunction parts passing through each of two slits. 

Let us now discuss a very interesting effect of magnetic field on the quantum interference. In 
order to make the discussion simpler, let us consider an alternative version of the two-slit experiment, in 
which each of alternative path is fixed to a narrow channel using a partial quantum confinement - see 
Fig. 2. (In this arrangement, moving the particle detector without changing channels' geometry, and 
hence local values of k may be more problematic in experimental practice, so let us think about its 
position r fixed.) 



channel 



channel 



region with 3^0 




w = w(B) 



Fig. 3.2. The AB effect. 



In this case, because of the effect of the walls providing the path confinement, we cannot use 
expressions (10) for amplitudes a\j,. However, from the discussions in Sec. 1.6 and Sec. 2.2, it should 
be clear that the first of expressions (10) remains valid, though may be with a value of k specific for 
each channel. 

The benefit of this geometry is that we can now apply magnetic field 3, perpendicular to the 
plane of particle motion, that would pierce contour C, but would not touch the particle propagation 
channels. In classical physics, magnetic field's effect on a particle with electric charge q is described by 
the Lorentz force 5 

¥ 3 =qyx3, (3.14) 



4 See, e.g., a detailed discussion in EM Sec. 8.4. 

5 See, e.g., Sec. 5.1. Note that Eq. (14), as well as all other formulas of this course, are in the SI units; in Gaussian 
units, all terms which include either 3 or A should be divided by c, the speed of light in free space. 
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where 3 is the field value at the point of its particle's location, so that for the experiment shown in Fig. 
2, ¥3 = 0, and the field would not affect the particle motion at all. In quantum mechanics, this is not so, 
and the field does affect the probability density w, even if 3 = 0 in all points where the wavefunction 
yAj) is not equal to zero. 

In order to describe this surprising effect, let us first develop a general framework for account of 
effects of electromagnetic fields on a quantum particle, which will also give us some important by- 
product results. In order to do that, we need to calculate the Hamiltonian operator of a charged particle 
in the field. For an electrostatic field, this hardly present any problem. Indeed, from classical 
electrodynamics we know that such field may be presented as a gradient of its electrostatic potential <fi, 

# = -V0(r), (3.15) 

so that the force exerted by the field on a particle with electric charge q, 

¥ t =q€, (3.16) 

may be described by adding the potential energy of the field, 

U(r)=q<t>(r), (3.17) 

to other (possible) components of the full potential energy of the particle. As we have already discussed, 
such a function of coordinates may be included to the Hamiltonian operator just by adding it to the 
kinetic energy operator (1.27). 

However, magnetic field's effect is peculiar: since its Lorentz force (14) cannot do any work on 
the particle: 

dW 3 = ¥ 3 -dr = ¥ 3 - \dt = q(\ x 3) ■ \dt = 0, (3.18) 

the field cannot be presented by any potential energy, so it may not be immediately clear how to account 
for it in the Hamiltonian. Help comes from the analytical-mechanics approach to classical 
electrodynamics: 6 in the nonrelativistic limit, the Hamiltonian function of a particle in electromagnetic 
field looks superficially like that in electrostatic field only: 

H = ^ + U = £- + q*; (3.19) 
2 2m 

however, the momentum p = mv that participates in this expression is now the difference 

p = P-^A. (3.20) 

Here A is the vector-potential that may be defined by the well-known relations for the electric and 
magnetic field: 

e = -S/(j,-—, £ = VxA, (3.21) 

dt 

while P is the canonical momentum whose Cartesian components may be calculated (in classics) from 
the Lagrangian function, 7 using the standard formula of analytical mechanics, 



6 See, e.g., EM Sec. 9.7. 

7 Just for reader's reference, the classical Lagrangian corresponding to Hamiltonian (19) is 
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dL 



3 or 



(3.22) 



To emphasize the difference between the two momenta, p = m\ is frequently called the 
kinematic momentum (or "mv-momentum"). The distinction between p and P = p + qA becomes even 
more clear if we notice that vector-potential is not gauge-invariant: according to the second of Eqs. (21), 
at the so-called gauge transformation 



A — > A + V j , 



(3.23) 



with an arbitrary single-valued scalar gauge function % = %(r, t), the magnetic field does not change. 
Moreover, according to the first of Eqs. (21), if we make the simultaneous replacement 



8t 



(3.24) 



the gauge transformation does not affect the electric field either. With that, the gauge function does not 
change the classical particle's equation of motion, and hence the velocity v and momentum p. Hence, 
the kinematic momentum is gauge-invariant, while P is not, because it changes by qV %■ 

Now the standard way of transfer to quantum mechanics is to treat the canonical rather than 
kinematic momentum according to correspondence postulate discussed in Sec. 1.2. This means that in 
the coordinate representation, the operator of this variable is given by Eq. (1.26): 8 



P = -mv , 



Hence the Hamiltonian operator corresponding to the classical function (19) is 




(3.25) 



(3.26) 



Canonical 

momentum 

operator 



so that the Schrodinger equation of a particle moving in electromagnetic field (but otherwise free) is 

(3.27) 




We may now repeat all the calculations of Sec. 1.4 for the case A ^ 0, and readily get the 
following generalized expression for the probability current density: 



(3.28) 



Charged 

particle 

in the field: 

Hamiltonian, 

Schrodinger 

equation, 

and probability 

current 




L = — — V qy ■ A- q<p 

- see EM Sec. 9.7. Note that this function includes A within a term that cannot be interpreted as either the purely 
kinetic energy (as the first term) or the purely potential energy (as the last term with the minus sign). 
8 The validity of this choice is clear from the fact that if the kinetic momentum was described by this differential 
operator, the Hamiltonian operator corresponding to the classical Hamiltonian function (19) would not include the 
magnetic field at all, and hence solutions of the corresponding Schrodinger equation could not satisfy the 
correspondence principle. 
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We see that the current density is gauge-invariant (as required for any observable) only if the 
wavefunction's phase q> changes as 

cp^cp + ^X. (3.29) 
n 

This may be a point of concern: since the quantum interference is described by the spatial dependence of 
phase <p, can the observed interference pattern depend on the gauge function choice (which would not 
make sense)? Fortunately, this is not true, because the spatial phase difference between two interfering 
paths, participating in Eq. (1 1), is gauge-transformed as 

9n -^<Pi2+t(%2-Zi)- (3-30) 
n 

But x has to be a single-valued function of coordinates, hence in the limit when points 1 and 2 coincide, 
%\ = %i, so that Acp (and hence the interference pattern) is gauge-invariant. 

However, the difference q> may be affected by the magnetic field, even if it is localized outside 
the channels in which the particle propagates. Indeed, in this case the field cannot not affect particle's 
velocity and current density j : 

j(r)|»*o = J( r )|s=o> (3-31) 

so that the last form of Eq. (28) yields 

V?>(r)U=V^(r)| s=0 +f A. (3.32) 

n 

Integrating this equation along contour C (Fig. 2), for the phase difference between points 1 and 2 we 
get 



^12 S#0 ^12 



,+HA-dr, (3.33) 



where the integral should be taken along the same virtually closed contour C as before (in Fig. 2, from 
point 1, counterclockwise along the dashed line to point 2). But from the classical electrodynamics we 
know 9 that as points 1 and 2 are overlapped, i.e. contour C becomes closed, the last integral is just the 
magnetic flux <J) = \& n d 2 r through any smooth surface limited by contour C, so that Eq. (33) may be 
presented as 



AB 
effect 



I I I st, 

n 



(3.34a) 



In terms of the interference pattern, this means a shift of interference fringes, proportional to the 
magnetic flux (Fig. 3). This phenomenon is usually called the "Aharonov-Bohm" (or just the AB) 
effect. 10 For particles with a single elementary charge, q = ±e, this result is frequently presented as 



9 See, e.g., EM Sec. 5.3. 

10 I personally prefer the latter, less personable name, because the effect was actually predicted by W. Ehrenberg 
and R. Siday in 1949, and merely rediscovered by Y Aharonov and D. Bohm in 1959. To be fair to Aharonov and 
Bohm, it was their work that triggered a wave of interest to the phenomenon, resulting in its first experimental 
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( a H b ) <P l2 \ M =(p l2 U =0 ±2x — , (3.34b) 

where the fundamental constant <£> 0 ' = Ixtile = hie « 4.14xl0~ 15 Wb has the meaning of flux necessary to 
change q>u by In, i.e. shift the interference pattern (1 1) by one period, and is called the normal magnetic 
flux quantum, because of the reasons we will soon discuss. 




Fig. 3.3. Typical results of a two-paths interference experiment by A. Tonomura et al., Phys. Rev. 
Lett. 56, 792 (1986), showing the AB effect for electrons well shielded from the applied magnetic 
field. In this particular experimental geometry, the AB effect produces a relative shift of the 
interference patterns inside and outside the dark ring, (a) O = O 0 72, (b) O = ®o '• © AIP. 



The AB effect may be "almost explained" classically, in terms of Faraday's electromagnetic 
induction. Indeed, a change A© of magnetic flux in time causes a vortex-like electric field A3 around it. 
That field is not restricted to the magnetic field's location, i.e. may reach particle's trajectories. The 
field's magnitude (or rather of its integral along contour Q may be readily calculated by integration of 
the first of Eqs. (21): 

AV = §A£-dr = -^-, (3.35) 

I hope that in this expression the reader readily recognizes the integral ("undergraduate") form of 
Faraday's induction law. Now let us assume that the variable separation described in Sec. 1.5 may be 
applied to the end points 1 and 2 of particle's alternative trajectories as two independent systems, 11 and 
that the magnetic flux' change by certain amount AO does not change the spatial parts y/j of 
wavefunctions of these systems. Then change (35) leads to the change of potential energy difference AU 
= qAV between the two points, and repeating the arguments that were used in Sec. 2.3 at the discussion 
of the Josephson effect, we may rewrite Eq. (2.53) as 

d^ = _AU = _ lAV = 1 dO (3 36) 

dt h h h dt 

Integrating this relation over the time of magnetic field's change, we get 



observation by R. Chambers in 1960 and several other groups soon after that. Later, the experiments were 
improved to provide a virtually perfect separation between electron trajectories and the applied magnetic field, 
using ferromagnetic cores and/or superconducting shielding - as in the work whose results are shown in Fig. 3. 
11 This assumption may seem a bit of a stretch, but the resulting relation (37) may be indeed proven for a rather 
realistic model, though that would take more time and space that I can afford. 
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Acp l2 =^AO, (3.37) 
n 

- superficially, the same result as given by Eq. (34). 

However, this interpretation of the AB effect is limited. Indeed, it requires the particle to be in 
the system (on the way from the source to the detector) during the flux change, i.e. when the induced 
electric field 3 may affect its dynamics. On the contrary, Eq. (34) predicts that the interference pattern 
would shift even if the field change has been made when the there is no particle in the system, and hence 
field could not be felt by it. Experiment confirms the latter conclusion. Hence, there is something in 
the space where a particle propagates (i.e., outside of the magnetic field region), which transfers 
information about even the static magnetic field to the particle. The standard interpretation of this 
surprising fact is as follows: the vector-potential A is not just a convenient mathematical tool, but a 
physical reality (just as its electric counterpart </)), despite the large freedom of choice we have in 
prescribing specific spatial and temporal dependences of these potentials without affecting any 
observable - see Eqs. (23)-(24). 

Let me briefly discuss the very interesting form the AB effect takes in superconductivity. In this 
case, our results require two changes. The first one is simple: since superconductivity may be interpreted 
as the Bose-Einstein condensate of Cooper pairs with electric charge q = 2e, <J>o ' has to be replaced by 
the so-called superconducting flux quantum 12 

Super- 
conducting 
flux 
quantum 

Second, since the pairs are Bose particles and are all condensed in the same quantum state, 
described by the same wavefunction, the total electric current density, proportional to the probability 
current density j, may be extremely large - in real superconducting materials, up to -10 12 A/m . In these 
conditions, one cannot neglect the contribution of that current into the magnetic field and hence its flux 
O, which (according to the Lenz rule of the Faraday induction law) tries to compensate changes in 
external flux. In order to see possible results of this contribution, let us consider a closed 
superconducting loop (Fig. 4). 



O 0 = — * 2.07 xl(T 15 Wb = 2.07 xl(T 7 Gs-cm 2 . 
2e 



(3.38) 




Due to the Meissner effect (which is just another version of the flux self-compensation), current 
and magnetic field penetrate inside the superconductor by only a small distance (called the London 



One more bad, though common, term - a wire can (super)conduct, but a quantum hardly can! 



Chapter 3 



Page 9 of 52 



Essential Graduate Physics 



QM: Quantum Mechanics 



penetration depth) d\~ 10" 7 m. 13 If the loop is made of a superconducting wire that is considerably 
thicker than we can draw a contour deep inside the wire, at that the current density is negligible. 
According to Eq. (28), everywhere at the contour, 

V^-^A = 0. (3.39) 
h 

Integrating this equation along the contour as before (from point 1 to the virtually coinciding point 2), 
we need to have the phase difference <pn = 2m, because the wavefunction ys <x exp{7^} in the initial 
and final points 1 and 2 should be "essentially" the same, i.e. produce the same observables. As a result, 
we get 

(3.40) Flux 

quantization 

This is the famous flux quantization effect, 14 which justifies the term "magnetic flux quantum" for the 
constant O 0 given by Eq. (38). 

Here I have to mention in passing very interesting effects of "partial flux quantization", that arise 
when a superconductor loop is closed by a Josephson junction, forming the so-called Superconductor 
QUantum Interference Device - "SQUID". Such devices are used, in particular, for supersensitive 
magnetometry and ultrafast, low-power computing. 15 




3.2. Landau levels and quantum Hall effect 

In the last section, we have used the Schrodinger equation (27) for analysis of static magnetic 
field effects in "almost- ID", circular geometries shown in Figs. 1, 2, and 4. However, this equation 
describes very interesting effects in higher dimensions as well, especially in the 2D case. Let us consider 
a uniform 2D quantum well (say, parallel to the [x, y] plane), with strong quantum confinement in the 
perpendicular direction z. According to the discussion in Sec. 1.6, energy-relaxed particles will always 
reside in the lowest energy subband, with constant quantization energy (E z )\. Adding this shift to well's 
flat floor U(x ,y) = const, and taking the resulting constant energy as the reference, for the 2D motion of 
the particle in the well, we reduce Eq. (27) to the similar equation, but with the Laplace operator acting 
only in directions x and y: 



h 2 ( d d . q 



2m 



n x — + n i— A 

dx ' dy h 



y/ = Ey/. (3.41) 



J 



Let us find its solutions for the simplest case when the applied static magnetic field is uniform 
and perpendicular to the plane: 

3 = 3n z . (3.42) 



13 For more detail, see EM Sec. 6.3. 

14 It was predicted in 1949 by F. London and experimentally discovered (independently and virtually 
simultaneously) in 1961 by two experimental groups: B. Deaver and W. Fairbank, and R. Doll and M. Nabauer. 

15 A brief review of these effects, and recommendations for further reading may be found in EM Sec. 
6.4. 
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According to the second of Eqs. (21), this imposes the following restriction on the choice of vector- 
potential: 



dA v dA. 



dx dy 



(3.43) 



but the gauge transformations still give us a lot of freedom in its choice. The "natural" axially- 
symmetric form, A = n p p^2, where p = (x 2 + y 2 ) 112 is the distance from some z-axis, leads to a 
cumbersome math. In 1928, L. Landau (then just 20 years old!) realized that the energy spectrum of Eq. 
(41) may be obtained by making a very simple choice 

4=0, A y =3(x-x 0 ), (3.44) 

which evidently satisfies Eq. (43), though it ignores the physical equivalence of the x and y directions. 
Now, expanding the eigenfunction into the Fourier integral in direction y: 



W{x,y) = \x k {x)e 

we see that for each component of this integral, Eq. (41) yields a specific equation 



2m 



d . 

ax 



Kx-x 0 ) 



X k =EX k . 



(3.45) 



(3.46) 



Since the vectors inside the square brackets are mutually perpendicular, its square has no crossterms, so 
that Eq. (46) may be rewritten as 



2m dx' 



-x t + 



2m 



h 



-3x 



X k = EX k , where x =x- 



hk 

q'i 



+ X n 



(3.47) 



But this ID Schrodinger equation is identical to Eq. (2.268) of the ID harmonic oscillator, with the 
replacement 



CO, = 



m 



(3.48) 



In this expression, it is easy to recognize the classical cyclotron frequency of particle motion in the 



magnetic field. (It may be readily obtained using the 2 nd Newton law for a circular orbit of radius r 



m — = F 3 = qv3 , 
r 



(3.49) 



Landau 
levels 



and noting that the resulting ratio v/r = q^m is just the radius-independent angular velocity co c of 
particle rotation.) Hence, the energy spectrum for each Fourier component of integral (45) is the same 



(3.50) 



and does not depend on either xo, or j 0 , or k. 

This is an example of a highly degenerate system: for each eigenvalue E n , there are many 
different eigenfunctions that differ by the positions {xo, yo] of their center on axis x, and the rate k of 




Chapter 3 



Page 11 of 52 



Essential Graduate Physics 



QM: Quantum Mechanics 



their phase change along axis y. They may be used to assemble a large variety of linear combinations, 
including 2D wave packets whose centers move along classical circular orbits with some radius r 
determined by initial conditions. Note, however, that such radius cannot be smaller than the so-called 

Landau radius, 





f h ^ 


1/2 


r L = 








K q9j 





(3.51) 



Landau 
radius 



which characterizes the minimum radius of the wave packet itself, and results from Eq. (2.271) after 
replacement (48). This radius is remarkably independent on particle's mass, and may be interpreted in the 
following way: the scale SA m i n of the applied magnetic field's flux through the effective area^4 m j n = 2^r L 

of the smallest wave packet is just one normal flux quantum O 0 ' = Inhlq. 

A detailed analysis of such wave packets (for which we would not have time in this course) 
shows that magnetic field does not change the average density dN^dE of different 2D states on the 
energy scale, but just "assembles" them on the Landau levels (see Fig. 5a), so that the number of states 
on each Landau area (per unit area) is 



1 dN, 



A A dE 



1 dN 2 
A dE 



,A£ = 



m 



2nfr 



ha>„ 



q3 
27th 



q3 
~h 



(3.52) 



This expression may again be interpreted in terms of magnetic flux quanta: ni,Qo ' 
particular state on each Landau level per each flux quantum. 



^, i.e. there is one 




Fig. 3.5. (a) "Condensation" of 
2D states on Landau levels, and 
(b) filling the levels by external 
electrons at the quantum Hall 
effect. 



The most famous application of the Landau levels concept is the explanation of the quantum 
Hall effect 16 . Generally, the Hall effect 17 is observed in the geometry sketched in Fig. 6, where electric 
current I is passed through a thin rectangular conducting sample (frequently called the Hall bar) placed 
into a magnetic field ^ perpendicular to the sample plane. The classical analysis of the effect is based on 
the notion of the Lorentz force (14). This force the deviates charge carriers (say, electrons) from their 
straight motion from one external electrode to another, bending them to the isolated edges of the bar (in 
Fig. 6, parallel to axis x). Here the carriers accumulate, generating a gradually increasing electric field 3, 
until its force (16) exactly balances the Lorentz force (14): 



16 It was first observed in 1980 by K. von Klitzing and coworkers. 

17 Discovered in 1879 by E. Hall. 
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(3.53) 



where v x is the drift velocity of the electrons along the bar (Fig. 6), providing the sustained balance 
condition & y lv x = 3 Z at each point of the 2D sample. 




Fig. 3.6. Hall effect geometry. Darker 
bars show external (3D) electrodes. 



Classical 
Hall 
effect 




With ri2 carriers per unit area, in a sample of width W, this condition yields the following 
classical expression for the so-called Hall resistance Rh- 



(3.54) 



This formula is broadly used in practice for the measurement of the carrier density m, and (in 
semiconductors) the carrier type - negative electrons or positive holes. 

However, in experiments with high-quality (low-defect) 2D quantum wells at very low, sub- 
kelvin temperatures 18 and high magnetic fields, the linear growth of R H with described by Eq. (54), is 
interrupted by virtually horizontal plateaus (Fig. 7) with constant values 



Rh = 



R 



K ' 



(3.55) 



where i (only in this context, following tradition!) is an integer, and value 

R K ~ 25.812807557 kQ 



(3.56) 



is reproduced with extremely high accuracy (~10~ 9 ) from experiment to experiment and from sample to 
sample. Such stability is a rare exception in solid state physics were most results are noticeably 
dependent on the particular material and particular sample under study. 

Let us apply the Landau level picture. The 2D sample is typically in a weak contact with 3D 
electrodes whose conductivity electrons form a Fermi sea with certain Fermi energy Ep, so that at low 
temperatures all states with E < Ep are filled with electrons - see Fig. 5b. As £ is increased, spacing 

h(£> c between the Landau levels increases, so that fewer and fewer of these levels are below Ep and are 
filled, and within broad ranges of field variation, the number i of filled levels is constant. (In Fig. 5b, i = 
2.) So, plugging n 2 = ini and q = ±e into Eq. (54), we get 

1 i, 

(3.57) 



Quantum 
Hall 
effect 




18 Recently, the quantum Hall effect was observed at room temperature mgraphene (a virtually perfect 2D sheet 
of carbon atoms, see Sec. 4 below) - see K. S. Novoselov et ah, Science 315, 1379 (2007). 
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i.e. exactly the experimental result (55), with 

h , Tih 
' 2e 



R k= — = 4-4". (3.58) 



This constant, exactly 4 times the quantum unit of resistance Rq given by Eq. (2.259), is in an excellent 
agreement with experimental value (56), and is sometimes called the Klitzing constant. 




Fig. 3.7. Typical record of the quantum 
Hall effect. The lower trace (with sharp 
peaks) shows the longitudinal component, 
VJI X , of the resistance tensor. (Adapted 
from www.prequark.org/Prequark.htm .) 



However, this oversimplified explanation of the quantum Hall effect does not take into account 
several important factors, including: 

(i) the role of nonuniformity of the quantum well bottom potential U(x, y), and of the localized 
states this nonuniformity produces, and the surprisingly small effect of these factors on the extraordinary 
accuracy of Eq. (55); 19 and 

(ii) the mutual Coulomb interaction of the electrons, in high-quality samples leading to the 
formation of Rh plateaus with not only integer, but also fractional values of i (1/3, 2/5, 3/7, etc.). 20 

Unfortunately, a thorough discussion of these interesting features is well beyond the framework 
of this course. 21 



3.3. Scattering and diffraction 

The second class of quantum effects that become more rich in multi-dimensional space is 
typically referred to as either diffraction or scattering - depending on the context. (Diffraction is 
essentially the interference, but of waves emitted by several many coherent sources.) Just as in the two - 



19 The explanation of this paradox may be obtained in terms of the so-called quantum edge channels - the quasi- 
1D regions of width (51), along the lines were the Landau levels cross the Fermi surface. Particle motion along 
these channels, which is responsible for electron transfer, is effectively one-dimensional and thus cannot be 
affected by modest uniformities of the potential distribution U(x, y). 

20 This fractional quantum Hall effect was discovered in 1982 by D. Tsui, H. Stormer, and A. Gossard. In 
contrast, the effect described by Eq. (55) with integer i (Fig. 7) is now called the integer quantum Hall effect. 

21 For a comprehensive discussion of these effects I can recommend, e.g., either the monograph by D. Yoshioka, 
The Quantum Hall Effect, Springer, 1998, or the review by D. Yennie, Rev. Mod. Phys. 59, 781 (1987). 
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slits in the Young-type experiment (Fig. 1), these sources are most frequently the elementary re-emitters 
of some initial wave from a single source. More generally, such re-emitting is called scattering; this term 
is also applied to particles - even if their quantum properties may be ignored. 22 

Figure 8 shows the general scattering situation. Most commonly, the detector of scattered 
particles (in the quantum case, read de Broglie waves) is located at a large distance r » a from the 
scatterer. 23 In this case, the main observable independent of r is the flux (number of particles per unit 
time) of particles scattered in a certain direction, i.e. their flux per unit solid angle. Since such flux is 
proportional to the incident flux of particles per unit area, the ability of the scatterer to re-emit in a 
particular direction may be characterized by the ratio of these two fluxes. This ratio has the 
dimensionality of area per unit angle, and is called the differential cross-section of the scatterer: 



Differential 
cross- 
section 



da _ flux of scatterd particles per unit solid angle 
dO. flux of incident particles per unit area 



(3.58) 



incident 
particles 



a 



scatterer 



r » a.k 




detector 



scattered particles 



Fig. 3.8. 3D scattering (schematically). 



Full 
cross- 
section 



Such name and notation stem from the fact that the integral of daldQ. over all scattering angles, 

(3.59) 



_ f ^g^ Q _ total flux of scattered particles 



da 



incident flux per per unit area 



(also with the dimensionality of area), has a simple interpretation as the full cross-section of scattering. 
For the simplest case when a macroscopic solid object scatters all classical particles hitting its surface, 
but does not affect the particles flying by it, a is just the geometrical cross-section of the object, as 
visible from the direction of incoming particles. 

In classical mechanics, 24 we first calculate the particle scattering angle as a function of the 
impact parameter b, and then average the result over all values of b, considered random. In this sense 
the calculations in wave mechanics are simpler, because a parallel beam of incident particles of fixed 
energy E may be fairly presented by a plane de Broglie wave 



ik r 



(3.60) 



22 See, e.g., CM Sec. 3.7. 

23 In optics, this limit is called the Fraunhofer diffraction - see, e.g., EM Sees. (8.6) and (8.8). 

24 For example, in the simplest task of derivation of the so-called Rutherford formula for scattering of a charged 
nonrelativistic particle by a point fixed charge, due to their Coulomb interaction - see, e.g., CM Sec. 3.7. 
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1/2 

with the free-space wave number ko = (2mE) ITi and constant probability current density (1 .49): 

jo=kl 2 — k °- ( 3 - 61 ) 

m 

This current density is exactly the flux of incident particles per unit area that is used in the denominator 
of definition (58), so the "only" remaining thing to do is to calculate the nominator of that fraction. 

To do this, let us write the Schrodinger equation for the scattering problem (now in the whole 
space including the scatterer) in the form 

[e-H,) ¥ = U(t) ¥ , (3.62) 

where 



H 0 = V 2 , and E = °- = . (3.63) 

2m 2m 2m 



the potential energy U(r) describes the effect of scatterer. Looking for the solution of Eq. (62) in the 
natural form 

¥=¥o+¥ s , (3-64) 

where y/$ is the incident wave (60), and y/ s has the sense of the scattered wave, and taking into account 
that former wave satisfies the free-space equation 

H 0 ¥o=E¥o, (3-65) 

we may reduce Eq. (62) to 

(E-H 0 ) Ws =U{rX Wo+Ws ). (3.66) 

The most straightforward (and common) simplification of this problem is possible if the 
scattering potential U(r) is in some sense weak. (We will derive the exact condition of this smallness 
below.) Then since at U(r) = 0 the scattering wave y/ s disappears, we may expect that at small but 
nonvanishing U(r), the main part of y/ s is proportional to its scale Uq. Then all terms in Eq. (66) are 
proportional to Uq, besides the product Uy/ S , which is proportional to Uq 1 . Hence, in the first 
approximation in Uq, that term may be ignored, and Eq. (66) reduces to the famous equation of the Born 
approximation: 15 



{e-H 0 ) Vs =U{t) ¥q 



,~ ,„ v Born 
(p.O/aj approximation 



This simplification changes the situation drastically, because the linear superposition principle 
allows finding an explicit solution of this equation (in the form of an integral) for an arbitrary function 
U{r). Indeed, after rewriting Eq. (67a) as 



25 Named after M. Born, who was the first one to apply this approximation in quantum mechanics. However, the 
basic idea of this approach had been developed much earlier (in 1881) by Lord Rayleigh in the context of 
electromagnetic wave scattering - see, e.g., EM Sec. 8.3. Note that the contents of that section repeats much of 
our current discussion - regrettably but unavoidably so, because the Born approximation is a centerpiece of 
scattering theory for both electromagnetic and de Broglie waves. 
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(v 2 +k 2 ) Ws =^U(r) Wo (r) 
n 



(3.67b) 



we may notice that y/ s is just a response of a linear system to a certain "excitation" (represented by the 
right-hand part) that is fixed, i.e. does not depend on the solution. Hence we can present y/ s as a sum of 
responses to elementary excitations from elementary volumes d r ': 



^ s (r) = |^(r>>')G(r,r V V . 



(3.68) 



Green's ^ ere ^( r ' r ") * s tne s P at ^ Green 's function, defined as such an elementary response of the free-space 
function Schrodinger equation to a point excitation, i.e. the solution of the following equation 26 



(v 2 +k 2 )G = S(r-r') 



(3.69) 



But we already know the physically-relevant spherically-symmetric solution of this equation - see Eq. 
(7) and its discussion: 



G(r,r') = ^e ikR . 
R 



(3.70) 



so that we need just to calculate the coefficient f+ for Eq. (67). This can be done in several ways, for 
example by noticing that at r « k~ , the second term in Eq. (70) is negligible, and it is reduced to the 
well-known Poisson equation with delta-functional right-hand part, which describes, for example, the 
electrostatic potential generated by a point electric charge. Either recalling the Coulomb law, or 
applying the Gauss theorem, 27 we readily get the asymptote 



1 



AtiR 



at kr « 1, 



which is compatible with Eq. (70) only if f+ = - \l\n, i.e. if 



Green's 
function 
for free 
space 




Plugging this result into Eq. (68), we get the final solution of Eq. (67) 

2m J R 



(3.71) 



(3.72) 



(3.73) 



Note that if function U(r) is smooth, the singularity in the denominator is integrable (i.e. not dangerous); 
indeed, the contribution of a sphere of radius ^— > 0, with the center in point r ' = 0, scales as 



r ^=44^=44^=2,^^0. 

J J? J R J 



^R 2 dR 



R 



R 



(3.74) 



26 Please notice both the similarity and difference between this Green's function and the propagator discussed in 
Sec. 2.1. In both cases, we use the linear superposition principle to solve wave equations, but while Eq. (68) gives 
the solution of the inhomogeneous equation (67), Eq. (2.44) does that for a homogeneous Schrodinger equation in 
which the wave sources are presented by initial conditions rather than by equation's right-hand part. 

27 See, e.g., EM Sec. 1.2. 
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Actually, Eq. (73) gives us more than we wanted: it evaluates the scattered wave at any point, 
including those within of the scattering object, while our goal was to find the wave far from the scatterer 
- please revisit Fig. 8 if you need. However, before going to that limit, we can use the general formula 
to find the quantitative criterion of the Born approximation's validity. Indeed, let us estimate the 
magnitude of the right hand part of this equation, for a scatterer of linear size ~a, and the potential 
magnitude scale Uo, in two limits: 

(i) If ka « 1, then inside the scatterer (i.e., at distances r' ~ a), both y/o~ exp{/A:r} and the 
second exponent under the integral change slowly, so that a crude estimate of the solution is 



Ws 



m 



-U 0 \y/ 0 \a 2 



(3.75) 



(ii) In the opposite limit ka »1, the integration along one of the dimensions (that of the wave 
propagation) is cut out on distances of the order of the de Broglie wavelength k~ , so that the integral is 
correspondingly smaller: 



Ws 



m a i 



Since the reduction of Eq. (66) to Eq. (67) requires I i// s \ 
now formulate the conditions of this requirement as 

h 2 



«\ 



a 
ka 



(3.76) 

everywhere within the scatterer, we may 

(3.77) 



U Q « -max[A:a, 1] . 

ma 

In the first factor of the right-hand part, we may readily recognize the scale of the kinetic (quantum- 
confinement) energy E a of the particle inside a quantum well of size ~ a, so that the Born approximation 
is valid essentially if the potential energy of particle's interaction with the scatterer is smaller than E a . 
Note, however, that estimates (75) and (76) are not valid in special situations when the effects of 
scattering accumulate in some direction. This is frequently the case for small scattering angles in 
extended objects (when ka » 1 but kaO < 1), and especially in ID (or quasi-lD) scatterers oriented 
along the incident particle beam. 

Now let us proceed to large distances r » r' ~ a, and simplify Eq. (73) using an approximation 
similar to the dipole expansion in electrodynamics. 28 In denominator's R, we can merely ignore r' in 
comparison with r, but the exponent requires more care, because even if r ' ~a « r, the product kr' ~ ka 
may still be larger than 1. In the first approximation in r', we can take (Fig. 9a): 



v 










X \ 
N 



Fig. 3.9. (a) Dipole expansion in the Born approximation and (b) definitions of vector q and angles % and 0. 



28 See, e.g., EM Sec. 8.2. 
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R = r — r' « r — n r' 



(3.78) 



and since the directions of vectors k and r coincide, i.e. k = kn r , 

kR*kr-k-r\ and e lkR * e'*'V /k r ' , 
With this replacement, and the incident wave in form (60), the Born approximation yields 

y/,(r) = Y —e \U(r')e v oJ dr'. 

This relation may be presented in a general form 29 



Scattering 
function 



Ws =a 0 f(^K) e ikr 



(3.79) 



(3.80) 



(3.81) 



where y(k, ko) is called the scattering function. 30 Its physical sense becomes clear from the calculation 
of the corresponding probability current density j s . For that, generally we need to use Eq. (1.47) with the 
gradient operator having all spherical-coordinate components. 31 However, at kr » 1 the main 
contribution to Vy/ S , proportional to k » 1/r, is provided by the term exp{ikr} which changes fast in 
the common direction of vectors r and k, so that 



v Ys ~ n ,- ~ ^¥ s > at *r » 1 . 

or 



(3.82) 



so that Eq. (1.47) yields 



• rn\ h I I 2 
J,(0)« — fl o 
m 



\f(KK)t 



(3.83) 



Since this vector is parallel to k and hence to r, the flux in the nominator of Eq. (58), i.e. the probability 
current per unit solid angle, is just r j s . Hence, the differential cross-section is simply 



dcr = j s r z 
dQ, j 0 



= |/(k,k 0 )| : 



and the total cross-section is 



a = ||/(k,k 0 )| 2 ^ ! 



(3.84) 



(3.85) 



so that the scattering function /(k, ko) gives us everything we need (and in fact more, because the 
function also contains information about the phase of the scattered wave). 



29 It is easy to prove that this form is an asymptotic form of any solution y/ s of the scattering problem (even that 
beyond the Born approximation) at sufficiently large distances r » a, k~ l . 

30 Note that function / has the dimension of length, and does not account for the incident wave. This is why 
sometimes a dimensionless function, 5* = 1 + 2ikf, is used instead. This function S is called the scattering matrix, 
because it may be considered as a natural generalization of the ID matrix S, defined by Eq. (2.133), to higher 
dimensionality. 

31 See, e.g., MA Eq. (10.8). 
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According to Eq. (80), in the Born approximation the scattering function may be presented as the 
Born integral 



/(k,k 0 ) = --^fc/(r)e fq "Vr, 
2m J 



(3.86) 



where for the notation simplicity I have replaced r ' with r, and also introduced the scattering vector 

q = k-k 0 , (3.87) 

with length q = 2k sin(^/2), where 0 is the scattering angle between vectors k and k 0 - see Fig. 9b. For 
the differential cross-section, Eq. (86) yields 

Differential 
cross- 

(3.88) section 

in the Born 
approximation 



da 


< m V 


\u(r)e-^ r d'r 


2 

? 


dQ~ 


k2t$i 2 ) 





and the total cross-section may be now readily calculated from the first of Eqs. (59). 32 

This is the main result of this section; it may be further simplified for spherically-symmetric 
scatterers, with 

U(r) = U(r). (3.89) 

Here, it is convenient to present the exponent in the Born integral as expj-z'gr'cosj}, where % is the 
angle between vectors k (i.e. the direction n r toward the detector) and q (rather than the incident wave 
vector ko!) - see Fig. 9b. Now, for fixed q, we can take this vector's direction as the polar axis of a 
spherical coordinate system, and reduce Eq. (86) to a ID integral: 



m 



In n 



/(k, k 0 ) = - -^j- 1 r 2 drU(r) J dq> J sin xd% exp {-iqr' cos x) 

0 0 

2singr 2m 



m 



(3.90) 



2m~ 



^r 2 drll(r) 2n- 



qr 



fi 2 q 



^U(r)sm(qr)rdr. 



As a simple example, let us use the Born approximation to analyze scattering on the following 
spherically-symmetric potential: 



£/(r) = £/ 0 exp 



2a' 



(3.91) 



In this particular case, it is better to avoid the temptation to exploit the spherical symmetry by using Eq. 
(90), and instead use the generic Eq. (88), because it falls apart into a product of three similar Cartesian 
factors: 



/(k,k 0 ) = -^/.iX 



27th 



2 x 



(3.92) 



with 



32 Note that according to Eq. (88), in the Born approximation the scattering intensity does not depend on the sign 
of potential U, and also that scattering in a certain direction is completely determined by a specific Fourier 
harmonic of function U(r), namely by the harmonic with the wave vector equal to the scattering vector q. 
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+03 



expi 



f 2 
X 

7a 



\dx, 



(3.93) 



and similar integrals for I y and I z . From Chapter 2, we already know that Gaussian integrals like I x may 
be readily worked out by complementing the exponent to the full square, in our current case giving 



.2 „2 



7 x =(2^ /2 aexp-^, etc., 



(3.94) 



so that, finally, 



dQ 1 01 



mll n 



Y 



2 W. 



7na' 



f tt 2V 

mu 0 a 
~1T 



2 2 

<7 a 



(3.95) 



Now, the total cross-section a is an integral of da/dQ over all directions of vector k. Since in 
our case the scattering intensity does not depend on the azimuthal angle <p, the integration is reduced to 
that over the scattering angle 0 (Fig. 9b): 

do ,^ „ r da . 



a = i^Q = 7n\— sin ft/0 = An 2 a 2 



C TT 2 

mu Q a 



dQ 
= 4ttV 



h' 



Jsin^J^expj 



2k sin — 

I 2 



/ mU Q a 2 
IF" 



■ U=7t 

J exp{- 7k 2 a 2 (l - cos - cos 6>) = 

0=0 



7k 



2 f TT 

mu n a 



2\ 



(3.96) 



\-e 



4k 2 a 



2 „2 



Let us analyze these formulas. In the low-energy limit, ka « 1 (and hence qa « 1 for any 
scattering angle), the scattered wave is virtually isotropic: da/dD. « const - a very typical feature of 
scattering by small objects, in any approximation. Notice that in this limit, the Born expression for a, 



a*%n 2 a 2 



mU 0 a 



2 A 



(3.97) 



is only valid if a is much smaller than the scale a of the physical cross-section of the scatterer. 

In the opposite, high-energy limit ka »1, the scattering is dominated by small angles 6 « q/k 

Ilka ~ Ala: 



da 
dQ. 



mU 0 a 



2 A 



I ? 2 2 $ Z 

exp<-A: a — 



(3.98) 



This is, again, very typical for diffraction. Notice, however, that due to the smooth character of the 
Gaussian potential (91), the diffraction pattern exhibits no oscillations; such oscillations of daldQ as 
function of angle naturally appear for potentials with sharp borders - see, e.g., Problems 2 and 3. 

The Born approximation, while being very simple and used more often than any other scattering 
theory, is not without substantial shortcomings, as is clear from the following example. It is not too 
difficult to prove the following general optical theorem, valid for an arbitrary scatterer: 
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Im/(k 0 ,k 0 ) = — a. 

An 



Optical 
(3.99) theorem 



However, Eq. (86) shows that in the Born approximation, function /is purely real at q = 0 (i.e. k = ko), 
and hence cannot satisfy the optical theorem. Even more evidently, it cannot describe such a simple 
effect as a dark shadow (y/& 0) cast by an opaque object (say, with U 0 » E). 

There are several ways to improve the Born approximation, while still holding the general idea 
of approximate treatment of U. 

(i) Instead of the main assumption y/ s <x Uq, we can use a complete perturbation series: 

¥s = ¥l +ys 2 +... (3.100) 

with y/ n oc Uo", and find successive approximations y/„ one by one. In the 1 st approximation we of course 
return to the Born formula, but already the 2 nd approximation yields 

Im/ 2 (k 0 ,k 0 ) = -^a 1 , (3.101) 
An 

where <j\ is the full cross-section calculated in the 1st approximation, so that the optical theorem (99) is 
"almost" satisfied. 33 

(ii) As was mentioned above, the Born approximation does not work very well for small-angle 
scattering by extended objects. This deficiency may be corrected by the so-called eikonal approximation 
(from Greek word sikov, meaning "icon") that replaces the plane wave exponent exp{z'£nx} 
representation of the incident wave by a WKB-like exponent, though still in the first nonvanishing 
approximation in U — > 0: 



ikx 



Qxp< i 



i\k{x')dx'\ 



-dx'l 



ikx 



m 
~¥k 



A 

\u{x')dx'. 



Eikonal 
(3.102) approximation 



This approximation's results satisfy the optical theorem (99) already in the 1 st approximation in U. 



3.4. Energy bands in higher dimensions 

In Sec. 2.5, we have discussed the ID band theory for potential profiles U(x) that obey the 
periodicity condition (2.192). For what follows, let us notice that that condition may be rewritten as 

U(x + X) = U(x), (3.103) 



33 The construction of such series may be facilitated by the following observation. If we retain y/ s in the right- 
hand part of Eq. (66), we may write a relation formally similar to Eq. (68) for the full wavefunction y/ = y/ 0 + y/ s ; 

¥ (r) = ¥o (r) + 2 ^\u(r') ¥ (r')G(r,r')d 3 r'. 
n J 

This is one of forms of the Lipmann-Schwinger equation that is exactly equivalent to the differential Schrodinger 
equation (66) but is more convenient for some applications, in particular for the calculation of higher 
approximations y/„. Unfortunately, I will have not time to discuss this approach in detail and have to refer the 
reader, for example, to either Chapter 9 of the textbook by L. Schiff, Quantum Mechanics, 3 ld ed., McGraw-Hill, 
1968, or (for even more details) to monograph by J. Taylor, Scattering Theory, Dover, 2006. 
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where X= ra, with r being an arbitrary integer. One can say that the set of points X forms a periodic ID 
lattice in the direct (x-) space. We have also seen that each Bloch state (i.e., each eigenstate of the 
Schrodinger equation for such periodic potential) is characterized by the quasi-momentum fiq and its 
energy does not change if q is changed by a multiple of 2nla. Hence if we form, in the reciprocal (k-) 
space, a ID lattice of points Q = lb, with b = 2nfa and integer /, any pair of points from these two 
mutually reciprocal lattices satisfies the following rule: 



exp{iQX} = explil—ra t = e M = 1 . 



(3.104) 



In this form, the results of Sec. 2.5 may be readily extended to J-dimensional periodic potentials 
whose translational symmetry obeys the following generalization of Eq. (103): 



U(r + R) = U(r) ., 



(3.105) 




Bravais 

andits wnere P ornts R> which may be numbered by d integers Tj, form the so-called Bravais lattice^ of points 
potential 

_ (3.106) 

with d primitive vectors a 7 . The simplest example of a 3D Bravais lattice are given by the simple cubic 
lattice (Fig. 10a), which may be described by the system of mutually perpendicular primitive vectors a y - 
of equal length. However, not in any lattice these vectors are perpendicular; for example Figs. 10b and 
10c show possible sets of the primitive vectors describing the face-centered cubic lattice (fee) and body- 
centered cubic lattice (bcc). In 3D, the science of crystallography, based on the group theory, 
distinguishes, by their symmetry properties, 14 Bravais lattices grouped into 7 different lattice 
systems? 5 



(a) 



(b) 



A 


& ■ Z'' 




f--f 


& 









(c) 



Am 




Fig. 3.10. The simplest (and most common) 3D Bravais lattices: (a) simple cubic, (b) face-centered cubic 
(fee), and (c) body-centered cubic (bcc), and possible choices of their primitive vector sets (blue arrows). 



Note, however, not all highly symmetric sets of points form Bravais lattices. As probably the 
most striking example, nodes of the very simple 2D honeycomb lattice (Fig. 11a) cannot be described by 



34 Named after A. Bravais, the crystallographer who introduced this notion in 1850. 

35 The strongest motivation for the band theory is provided by properties of solid crystals. Thus it is not surprising 
that perhaps the most clear, well illustrated introduction to the Bravais lattices may be found in Chapters 4 and 7 
of the famous textbook by N. Ashcroft and N. Mermin, Solid State Physics, Saunders College, 1976. 
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a Bravais lattice - while the 2D hexagonal lattice, shown in Fig. 1 lb, can. The most prominent 3D case 
of such a lattice is the diamond structure (Fig. 11c), which describes, in particular, atoms of world's 
most important crystal - silicon. 36 In cases like these, the band theory is much facilitated by the fact that 
the Bravais lattices using some point assemblies (called primitive unit cells) may describe these point 
systems. For example, Fig. 11a shows the possible choice of primitive vectors for the honeycomb 
structure, 37 with the primitive unit cell formed by any two adjacent points of the original lattice (say, 
within the dashed ellipses in Fig. 11a). Similarly, the diamond lattice may be described as the fee 
Bravais lattice with two-point primitive unit cell. 38 

Now we are ready for the following generalization of the ID Bloch theorem, given by Eqs. 
(2.193) and (2.210), to higher dimensions. Any eigenfunction of the Schrodinger equation describing 
particle's motion in the periodic potential (105) may be presented either as 



or as 



y/{r + R) 




y/{r) = u(r)e iqr , 


with «(r + R) = u(r), 



where the quasi-momentum hq is again a constant of motion, but now is a vector. 

(a) . --v (b) 




(3.107) Twoforms 

of the 3D 
Bloch 

(3.108) theorem 



(c) 












/ 

/ 




'"Q 


1 s ^y^^^~ ' 





Fig. 3.11. Some important periodic structures that require two-point primitive cells for their Bravais lattice 
presentation: (a) 2D honeycomb lattice and their primitive vectors and (c) 3D diamond lattice. For a contrast, 
panel (b) shows the 2D hexagonal structure which forms a Bravais lattice with a single-point primitive cell. 



The key notion of the band theory is the reciprocal lattice in the wavevector space, formed as 



7 = 1 



Reciprocal 
(3.109) lattice in 
q-space 



36 It may be best understood as the sum of two fee lattices of side a, mutually shifted by vector {1,1, l}a/4, so 
that the distances between each point of the combined lattice and its 4 nearest neighbors (see the thick gray lines 
in Fig. 11c) are all equal. 

37 This structure is presently very popular due to the recent discovery of graphene - isolated monolayer sheets of 
carbon atoms arranged in a honeycomb lattice with the interatomic distance of 0.142 nm. 

38 A harder case is presented by quasicrystals (whose idea may be traced down to medieval Islamic tilings, but 
was discovered in natural crystals, by D. Shechtman et ah, only in 1984), which obey high (say, 5-fold) rotational 
symmetry, but cannot be described by a Bravais lattice with any finite primitive unit cell. For a popular review of 
quasicrystals see, for example, P. Stephens and A. Goldman, Sci. Amer. 264, #4, 24 (1991). 
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Primitive 
vectors 
of the 
reciprocal 
lattice 



with integer lj, and vectors b 7 selected in such way that the following generalization of Eq. (104) is valid 
for any pair of points of the direct and reciprocal lattices: 

e zQR =l. (3.110) 

The importance of lattice Q is immediately clear from the first formulation of the Bloch theorem, given 
by Eq. (107): if we add to q any vector Q of the reciprocal lattice, the wavefunction does not change. 
This means that all information about the system is contained in just one elementary cell of the 
reciprocal space q. Its most frequent choice, called the 1 st Brillouin zone, is the set of all points q that 
are closer to the origin than to any other point of lattice Q. 

It is easy to see that primitive vectors bj of the reciprocal 3D lattice 39 may be constructed from 
those of the initial, direct lattice as 



~ a-> * &i . - 3i Xfl. 3L-. X 21-, 

\s,=2n j 3 -^-, \) 2 =2tt j !— ^, b 3 =2;r- 1 



ai -(a 2 xa 3 )' J a r (a 2 xa 3 )' a,-(a 2 xa 3 ) 



(3.111) 



Indeed, from the "operand rotation rule" of the vector algebra 40 it is evident that a/by = Irtdjy. Hence, 
the exponent in the left-hand part of Eq. (1 10) is reduced to 

e zQ R = exp{2OT(/ 1 r 1 +1 2 t 2 + / 3 r 3 )}. (3.112) 

Since all lj and Tj are integers, the expression in the parentheses is also an integer, so the exponent 
indeed equals 1, thus satisfying the definition of the reciprocal lattice given by Eq. (110). 

As the simplest example, let us return to the simple cubic lattice of period a (Fig. 10a), oriented 
in space so that 

aj=an v , a 2 = an , a 3 =an z , (3.113) 
According to Eq. (1 1 1), its reciprocal lattice is (of course) also cubic: 

Q = — (l x n x +l y n y +l z n z ), (3.114) 
a 

so that the 1 st Brillouin zone is a cube with side b = 2nla. Almost similarly simple calculations show that 
the reciprocal lattice of fee is bec, and vice versa. Figure 12 shows the resulting 1 st Brillouin zone of the 
fee lattice. 

The notion of the reciprocal lattice 41 makes the multi-dimensional band theory not much more 
complex than that in ID, especially for numerical calculations, at least for the single-point Bravais 
lattices. Indeed, repeating all the steps that have led to Eq. (2.218), but now with a J-dimensional 
Fourier expansion of functions U(r) and w/(r), we readily get its generalization: 

JX-i"r =(E-E l )u l , (3.115) 

1V1 



39 For the 2D case (j = 1, 2), one may use, for example, the first two formulas of Eq. (Ill) with a 3 = aixa 2 . 

40 See, e.g.,MAEq. (7.6). 

41 This notion is also the main starting point of X-ray diffraction studies of crystals, because it allows rewriting 
the well-known Bragg condition for diffraction peaks in an extremely simple form of the momentum conservation 
law: k = k 0 + Q, where k 0 and k are the wave vectors of the, respectively, incident and diffracted photon. 
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where 1 is now a d-dimensional vector of integer indices lj. The summation in Eq. (115) should be 
carried over all (essential) components of this vector (i.e. over all relevant nodes of the reciprocal 
lattice), so writing a corresponding computer code requires a bit more care than in ID; however, this is 
just a homogeneous system of linear equations, and numerous routines of finding its eigenvalues E are 
readily available from both public sources and commercial software packages. 42 



What is indeed more complex than in ID is the presentation (and hence the comprehension :-), of 
the calculation results and experimental data. Typically, the presentation is limited to plotting the Bloch 
state eigenenergy as a function of components of vector q along certain special directions the reciprocal 
space of quasi-momentum (see, e.g., the lines shown in Fig. 12), typically plotted on single panel. 
Figure 12 shows perhaps the most famous (and certainly the most practically important) of such plots, 
the band structure of silicon. The dashed horizontal lines mark the "indirect" gap of width 1.12 eV 
between the "valence" and "conduction" energy bands, which is the playground of virtually all silicon- 
based electronics. 




y 



Fig. 3.12. 1 st Brillouin zone of the fee lattice, and the 
traditional notation of its main directions. Adapted from 
http://en.wikipedia.org/wiki/Band structure . 



E[eV] 




Band Gap 



Fig. 3.13. Band structure of silicon, along the special 
directions shown in Fig. 12. (Adapted from 
http://www.tf.uni-kiel.de/matwis/amat/semi en/ .) 
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42 See, e.g., MA Sec. 16 (iv). 
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In order to understand the reason of this band structure presentation complexity, let us see how 
we would start to develop the weak-potential approximation for the simplest case of a 2D square lattice 
(which is a subset of the cubic lattice, with tt, = 0). Its 1 st Brillouin zone is of course also a square, of 
area ilnld) . Let us draw the lines of constant energy of a free particle (U = 0) in this zone. Repeating 
the arguments of Sec. 2.7 (see especially Fig. 2.28 and its discussion), we should conclude that Eq. 
(2.216) should now be generalized as follows, 



2,2 



E = E,= 



2m 



2m 



f 2nK 



v 



a 



2n I, 



9, 



(3.116) 



with all possible integers l x and l y . Considering the result only within the 1 st Brillouin zone, we see that 
as energy E grows, the lines of equal energy evolve as shown in Fig. 14. Just like in ID, the weak- 
potential effects are only important at the Brillouin zone boundaries, and may be crudely considered as 
the appearance of narrow energy gaps, but one can see that the band structure in q-space is complex 
enough even without these effects. 



2k 
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v 



1y * (a) 
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(b) 
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\ 0 







Fig. 3.14. Lines of constant energy 
E of a free particle, within the 1 st 
Brillouin zone of a square Bravais 
lattice, for: (a) ElEi * 0.95, (b) EIE\ 
* 1.05; and (c) EIE X * 2.05, where 
Ei = Tftfllma. 



Lhe tight-binding approximation is usually easier to follow. For example, for the same square 2D 
lattice, we may repeat the arguments that have led us to Eq. (2.203), to write 43 



ih- 



da 



dt 



0,0 ~ / 



a -i,o ~ >ra +\fl ~*~ a o,+i "*"^0,-l)» 



(3.117) 



where indices correspond to the deviations of integers t x and z y from an arbitrarily selected minimum of 
the potential energy - and hence wavefunction's "hump" quasi-localized at this minimum. Now, looking 
for the stationary solution of these equations, that corresponds to the Bloch theorem (107), instead of 
Eq. (2.206) we get 

E = E n +s n =E n -S n (e iq * a +e~ iq * a + e^) = E n - 2S n (cos q x a + cos q y a). (3.118) 



Figure 15 shows this result, within the 1 st Brillouin zone, in two forms: as the color-coded lines of equal 
energy and as a 3D plot (also enhanced by color). 



43 Actually, using the same values of S„ in both directions implies some sort of symmetry of the quasi-localized 
states. For example, s-states of axially-symmetric potentials (see the next section) always have such a symmetry. 
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It is evident that the plots of this function along different lines on the q-plane, for example along 
one of axes (say, q x ) and along a diagonal of the 1 st Brillouin zone (say, q x = q } ) give different curves, 
qualitatively similar to those of silicon (Fig. 13). The latter structure is complicated by the fact that the 
primitive cell of their Bravais lattices contains more than 2 atoms - see Fig. 11c and its discussion. In 
this case, even the tight-binding picture becomes more complex. Indeed, even if the atoms in the 
different positions of the primitive unit cell are similar (as they are, for example, in both graphene and 
silicon), and hence the potential well shape near those points and the corresponding local wavefunctions 
u(r) are similar as well, the Bloch theorem (which only pertains to Bravais lattices!) does not forbid 
them to have different complex amplitudes a{t) whose time evolution should be described by a specific 
differential equation. 

For example, in order to describe the honeycomb lattice shown in Fig. 1 la, we have to prescribe 
different amplitudes to the "top" and "bottom" points of its primitive cell - say, a and /?, 
correspondingly. Since each of these points is surrounded (and hence weakly interacts) with 3 neighbors 
of the opposite type, instead of Eq. (1 17) we have to write two equations 

^ = -^i>;> ^ = -^f>/> ( 3 - 119 ) 

where each summation is over 3 next-neighbor points. (I am using different summation indices just to 
emphasize that these directions are different for the "top" and "bottom" points of the primitive cell - see 
Fig. 11a.) Now using the Bloch theorem (107) in the form similar to Eq. (2.205), we get two coupled 
systems of linear algebraic equations: 

(E - E n )a = Sjj^ e iqTj , (E - E n )fi = S.aj^e^ , (3. 120) 

where r, and r'f are the next-neighbor positions, as seen from the top and bottom points, respectively. 
Writing the condition of consistency of this system, we get two equal and opposite values for energy 
correction for each value of q: 

E ± =E n ±SX' 2 , where 2 = jY q '( r J :+r '\ (3.121) 
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According to Eq. (120), these two energy bands correspond to the phase shifts (on the top of the regular 
Bloch shift q-Ar) of either 0 or ^"between the adjacent quasi-localized wave functions w(r ). 

The most interesting corollary of such energy symmetry, augmented by the honeycomb lattice 
symmetry, is that for certain values qo of vector q (that turn out to be in each of 6 corners of the 
honeycomb-shaped 1 st Brillouin zone), the double sum E vanishes, i.e. the two band surfaces E+(q) 
touch each other. As a result, in vicinities of these Dirac points 44 the dispersion relation is linear: 



~ E n ±nv nHl where q = q-q D , (3.122) 



with v„ cc 8 n being a constant with the dimension of velocity (for graphene, close to 10 6 m/s). Such a 
linear dispersion relation ensures several interesting transport properties of graphene. For their 
discussion, I have to refer the reader to special literature. 45 



3.5. Axially-symmetric systems 

I cannot conclude this chapter (and hence our review of wave mechanics) without addressing the 
issue of eigenstates and eigenvalues at full quantum confinement in multi-dimensional potentials U(r). 
For an arbitrary potential, the stationary Schrodinger equation does not have an analytical solution, but a 
substantial symmetry of function U(r) may make such solution possible. This pertains, in particular, to 
the axial symmetry in 2D problems and the spherical symmetry in 3D problems, which are typical for 
several important situations (or their reasonable models), especially in atomic and nuclear physics. 

In rare cases such symmetry may be exploited by the separation of variables in Cartesian 
coordinates. The most famous example is the d-dimensional harmonic oscillator, i.e. a particle moving 
inside the potential 

U= rH ^±r;. (3.123) 

1 7=1 

Separating the variables exactly as we did for the rectangular quantum well (see Sec. 1.5), for each 
degree of freedom we get the Schrodinger equation (2.268) of a ID oscillator, whose eigenfunctions are 
given by Eq. (2.278), and the energy spectrum is described by Eq. (2.114). As a result, the total energy 
spectrum may be indexed by vector n = {n\, ni,. . ., rid} ofd independent integers ("quantum numbers"): 



E B = hco 0 



d" 



(3.124) 



all of them ranging from 0 to qo. Note that every energy level of this system, with the only exception of 
the ground state, 



44 This term is based on a (pretty loose) analogy with the Dirac theory of relativistic quantum mechanics, that will 
be discussed in Chapter 9. 

45 See, e.g., a recent review by A. Castro Neto et ah, Rev. Mod. Phys. 81, 109 (2009). Note that transport 
properties of graphene are determined by coupling of 2p z electron states of carbon atoms, whose wavefunctions 
are proportional to exp{±/^} rather than are axially-symmetric as implied by Eqs. (120). However, due to the 
lattice symmetry this fact does not affect the dispersion relation E(q). 
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=f]Vo(o) = 

7=1 



2x 0 ^ 7 I 



(3.125) 



is degenerate: several different wavefunctions, each with its own different set of quantum numbers tij, 
but the same value of their sum, have the same energy. 

However, the harmonic oscillator problem is an exception: for other central- and spherically- 
symmetric problems the solution is made easier by using more appropriate coordinates. Let us start with 
the simplest axially-symmetric problem: the so-called 2D rotator, i.e. a particle constrained (quantum- 
confined) to move along a plane, round circle of radius R (Fig. 15). 46 



l = R<p 




Fig. 3.16. 2D rotator. 



Despite its common name, the 2D rotator has just one degree of freedom, say the displacement 
arc / = Rq> . So, its classical energy (and Hamiltonian function) is H = pi 12m, p/ = mv = m{dlldi). This 
function is similar to that of a free ID particle (with the replacement x — > I), and hence rotator's 
quantum properties may be described by a similar Hamiltonian operator: 



H = 



" 2 
P 

2m 



with p = -in 



81 



and its eigenfunctions have a similar structure: 



Y = Ce 



ikl 



(3.126) 



(3.127) 



The "only" new feature is that in the rotator, all observables should be 2 ^-periodic functions of /, and 
hence, as we have already discussed in the context of the magnetic flux quantization (see Fig. 4 and its 
discussion), as the particle makes one turn about the center, its wavefunction's phase kl may only 
change by 2im, with an arbitrary integer n (from -go to +°o),: 



¥n (l + 27rR) = ys n {l)e 



2mn 



(3.128) 



With eigenfunctions (127), this immediately gives condition gives k 2nR = 2m. Thus, wavenumber k 
can take only quantized values k„ = n/R, so that the eigenfunctions should be indexed by n: 




n 1 ?Q"> 20 rotator: 

^ ' ' eigenfunctions 



and the energy spectrum is discrete: 



46 This is a reasonable model for the confinement of light atoms, notably hydrogen, in some organic compounds. 
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2D rotator: 
eigenenergies 




(3.130) 



So, while the free translation motion of a quantum particle is continuous, in the sense that its 
momentum has a continuous spectrum, its rotation is quantized - the most important fact, which has so 
many implications (including the existence of atoms, molecules, and hence us humans, and hence 
science including this course :-). 

This simple model allows an exact analysis of external magnetic field effects on a quantum- 
confined motion of an electrically charged particle. Indeed, if this field is uniform and directed 
perpendicular to rotator's plane, it does not violate the axial symmetry of the system. According to Eq. 
(26), in this case we have to generalize Eq. (126) as 



H = 



1 

2m 



ihn m — -qA 

v di 



(3.131) 



Here, in contrast to the gauge choice (44), which was so instrumental in the Landau level problem, it is 
now clearly beneficial to take the vector-potential in a manifestly axially-symmetric form A = A{p)n ip , 
where p = {x, y} is the 2D radius-vector. Using the well-known expression for curl in cylindrical 
coordinates, 47 we can readily check that the requirement VxA = 3a z , with 3= const, is satisfied by the 
following function: 



A = n 



Bp 



(3.132) 



For the 2D rotator, p = R = const, so that the stationary Schrodinger equation becomes 



1 ( 8 3R^ 



2m 



ih q — 

9/ 2; 



¥n = E n¥n- 



(3.133) 



A little bit surprisingly, this equation is still satisfied with the sine -wave eigenfunctions (127). 
Moreover, since the periodicity condition (128) is also unaffected by the applied magnetic field, we 
return to field-independent eigenfunctions (129). However, the field does affect the system's energy: 



2D rotator 
in magnetic 
field 




(3.134) 



where O = nR B is the magnetic flux through the area limited by the particle's trajectory, and ©o ' = hlq 
is the "normal" magnetic flux quantum we have already met in the AB effect context - see Eq. (34) and 
its discussion. The field also changes the electric current of the particle in n-th state: 



h 



2im 



d iqR3 

di 2h 



J 



_h_ 
mR 



\C. 



(3.135) 



0 J 



Normalizing wavefunction (129) to have W„ = 1, we get \C n \ = \l2nR, so that Eq. (135) becomes 



47 See, e.g., MA Eq. (10.5). 
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with I 0 = 



hq 



o J 



ItuhR- 



(3.136) 



Functions E n (0) and /„ (0) are shown in Fig. 17. Note that since 0o' cc \/q, for any sign of the 
particle's charge, dI n /dO <0. It is easy to check that this means that the current is diamagnetic, 4g i.e. 
corresponds to the Lenz rule of the Faraday's electromagnetic induction: the field-induced current flows 
in such direction that its own magnetic field tries to compensate the external magnetic flux applied to 
the loop. 













n = 0^><<$ n = +1 

> 




O/O 0 ' 



0/0, 



n = -l 



Fig. 3.17. Effect of magnetic field on a 
charged 2D rotator. Dashed lines show 
possible inelastic transitions between 
metastable and ground states, due to weak 
interaction with environment, as the 
magnetic field is being increased. 



This result may be interpreted as a different implementation of the AB effect. 49 In contrast to the 
two-slit interference experiment that was discussed in Sec. 1, in the situation shown in Fig. 17 the 
particle is not absorbed by the detector, but travels around the ring continuously. As a result, its 
wavefunction is rigid: due to the boundary condition (128), the topological quantum number n is 
discrete, and magnetic field cannot change the wavefunction gradually. In this sense, the system is 
similar to a superconducting loop - see Fig. 4 and its discussion. The difference between these systems 
is two-fold: 

(i) For a single charged particle, in a macroscopic systems with practicable values of q, R, and m, 
the current scale I 0 is very small. For example, for m = m e , q = -e, and R = 1 urn, Eq. (136) yields I 0 ~ 3 
pA. 50 The contribution LI ~ jUqRIo ~ 10" 4 Wb of the current so small into the net magnetic flux is 



48 This effect, whose qualitative features remain the same for all 2D or 3D localized states (see Chapter 6 below), 
is frequently referred to as the orbital diamagnetism. In magnetic materials consisting of particles with 
uncompensated spins, this effect competes with another effect, spin paramagnetism - see, e.g., EM Sec. 5.5. 

49 It is straightforward to check that Eqs. (133) and hence (135) remain valid even if the magnetic field lines do 
not touch the particle's trajectory, and the field is localized well inside rotator's ring. 

50 Such persistent, macroscopic diamagnetic currents in non-superconducting systems may be experimentally 
observed, for example, by measuring the weak magnetic field generated by electrons in a system of a large 
number (~10 7 ) of similar conducting rings - see, e.g., L. Levy et ah, Phys. Rev. Lett. 64, 2074 (1990). Due to the 
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negligible in comparison with O 0 ' ~ 10" 15 Wb, so that the quantization of n does not lead to the magnetic 
flux quantization. 

(ii) As soon as the magnetic field raises the eigenstate energy E„ above that of another eigenstate 
E n ; the former state becomes metastable, and weak interactions of the system with its environment 
(which are neglected in our simple model) may induce a quantum transition of the system to the lower- 
energy state, thus reducing the diamagnetic current's magnitude - see the dashed lines in Fig. 17. The 
flux quantization in superconductors is much more robust to such perturbations. 51 

Now let us return, for one more time, to Eq. (129), and see what do they give for one more 
observable, particle's angular momentum 

L = rxp, (3.137) 

In our current problem, vector L has just one component perpendicular to the rotator plane, 

L z =Rp. (3.138) 

In classical mechanics, L z of the rotator should be conserved (due to the absence of external torque), but 
can take arbitrary values. In quantum mechanics the situation changes: with p = fik, our result k„ = nIR 
may be rewritten as 

L z ={L z ) n =RTik n =tin. (3.139) 

Thus, the angular momentum is quantized: it may be only a multiple of the Planck constant fi - 
confirming Bohr's guess - see Eq. (1.10). As we will see in Chapter 5, this result is very general (though 
may be modified by spin effects) and that wavefunctions (129) may be interpreted as eigenfunctions of 
the angular momentum operator. 

In order to implement the 2D rotator in our 3D world, we needed to provide rigid confinement of 
the particle both in the motion plane, and along radius p. Let us proceed to the more general problem 
when only the former confinement is strict, i.e. to a 2D particle moving in an arbitrary centrally- 
symmetric potential 

U(p) = U(p). (3.140) 

Using the well-known expression for the 2D Laplace operator in polar coordinates, 52 we may present the 
2D stationary Schrodinger equation in the form 



2m 



1A 
p d P 



p- 

V 



dp 

Separating the radial and angular variables as 53 



1 

+ 



y/ + U(p)y/ = Ey . (3.141) 



dephasing effects of electron scattering by phonons and other electrons, the effect's observation requires 
submicron samples and millikelvin temperatures. 

51 Interrupting a superconducting ring with a weak link (Josephson junction), i.e. forming a SQUID, we may get 
the switching behavior similar to that shown with dashed arrows in Fig. 17 - see, e.g., EM Sec. 6.3. 

52 See, e.g., MA Eq. (10.3) with 8/dz = 0. 

53 At this stage, I do not want to mark the particular solution (eigenfunction) y/ and corresponding eigenenergy E 
by any index, because we already may suspect that in a 2D problem the role of this index will be played by two 
integers - two quantum numbers. 
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we get, after the division by y/ and multiplication by p 2 , the following equation: 



2m 



p d 
5? dp 



P 



d^ 
dp 



+ ■ 



1 d 2 F 
F dcp 1 



+ p 2 U(p) = p 2 E. 



(3.142) 



(3.143) 



2 2 

It is clear that the fraction (d Fldcp )IF should be a constant (because all other terms of the equation may 
be only functions of p alone), so that we get for function F((p) an ordinary differential equation, 

d 2 F 



dcp 2 



+ v l F = 0. 



(3.144) 



where is the variable separation constant. The fundamental solution of Eq. (144) is evidently F <x 
exp{±z vq>). Now requiring, as we did for the 2D rotator, the 2^-periodicity of any observable, i.e. 

2mn 



F(<p + 2n) = F((p)e A 
so that constant v has to be integer (say, n), and we can write: 54 

F — C e in<p 

2 2 2 

Plugging the resulting relation (d Fldcp )IF = -n into Eq. (143), we may rewrite is as 



2m 



1 d 



p~K dp 



P 



d^_ 
dp 



P 



+ U(p) = E 



The physical interpretation of this equation is that the full energy is a sum, 
of the radial-motion part 



E = E p +E 9 , 



fi 1 d 
E = 

2m p dp 



P 



d^_ 
dp 



+ U(p). 



(3.145) 
(3.146) 

(3.147) 

(3.148) 

(3.149) 



and the angular-motion part 



* 2 2 

n n 
2mp 2 



(3.150) 



Now let us notice that a similar separation exists in classical mechanics, 55 because the total 
energy of a particle moving in a central field may be presented, within the plane of motion, as 



E = % 2 + U(p) = ^(p 2 + P9 1 )+ U(p) = E p+ E (p , 



(3.151) 



where 



54 Noting that for the 2D rotator (Fig. 16) II R = <p, we can present Eq. (129) in a similar form. This is natural, 
because the rotator is just a particular case of our current problem - with a rigid confinement along axis p. 

55 See, e.g., CM Sec. 3.5. 
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E P ^ + U (P) , E r ~. (3.152) 

The comparison of the latter relation with Eqs. (139) and (150) gives us grounds to suspect that the 
quantization rule L z = nti may be valid for this problem as well, and may be in other cases as well. In 
Sec. 5.6, we will see that this is indeed the case. 

Returning to Eq. (147), on the basis of our experience with ID wave mechanics we may expect 
that this ordinary, linear, second-order differential equation should have (for a motion confined to a 
certain final region of its argument p), for any fixed n, a discrete energy spectrum described by some 
other integer quantum number (say, I). This means that eigenfunctions (142), and corresponding 
eigenenergies (148) should be indexed by two quantum numbers. Note, however, that since the radial 
function obeys equation (147), which already depends on n, function ^(p) should carry both indices, so 
the variable separation is not so "clean" as it was for the rectangular quantum well. Normalizing the 
angular function to the full circle, A^= In, we may rewrite Eq. (142) as 

Vn> =ZnAP)K(P) = ^^Z n Ap)e in(p . (3.153) 

A good (and important) example of a solvable problem of this type is a free 2D particle whose 
motion is rigidly confined to a disk of radius R: 

f 0, for 0 < p < R, 
U(p) = \ (3.154) 
[+oo, tori? < p. 

In this case, the solutions ^„j(p) of Eq. (147) are proportional to the first-order Bessel functions 
J n (kip), 56 and the spectrum of possible values of parameter ki should found the boundary condition 
Ki,i(R) = 0. Let me leave the detailed solution and analysis of this problem for reader's exercise. 



3.6. Spherically-symmetric systems: Brute force approach 

Now let us address the (mathematically more involved) case of 3D motion, with spherically- 
symmetric potential 

U(r) = U(r). (3.155) 

Let me start, again, with a rotator - now a 3D rotator, i.e. a particle confined to move on the surface of a 
sphere of radius R. Despite the name, it has just 2 degrees of freedom, because any position on the 
spherical surface is completely described by two coordinates - say, the polar angle 6 and the azimuthal 
angle (p. In this case, the kinetic energy we need to consider is limited to its angular part, so that in the 
Laplace operator in spherical coordinates 57 we may keep only those parts, with fixed r = R. Then the 
stationary Schrodinger equation becomes 



56 A short summary of properties of these function, plus a few plots and a useful table of values, may be found in 
EM Sec. 2.4. For more on of Bessel functions, see the literature recommended in MA Sec. 16(ii). 

57 See, e.g., MA Eq. (10.9). 



Chapter 3 



Page 35 of 52 



Essential Graduate Physics 



QM: Quantum Mechanics 



2mR' 



1 



sinOdO 



sin# — 

80 



+ ■ 



1 



a 2 



sin 2 # 



y/ = Ey> 



(3.156) 



(Again, I abstain from attaching any indices to y/ and E for the time being.) With the usual variable 
separation assumption, 

y=®(0)F(p), (3.157) 



Eq. (156), with all terms multiplied by sin OI&F, yields 

h 2 



2mR z 



sinO d ( . „d® 

sin# — 

0 d0\ d6 



+ ■ 



1 d 2 F 
F d 2 (p 



Esin 2 0. 



(3.158) 



2 2 

Just as in Eq. (143), fraction (d F/dx )/F may be a function of <p only, and hence has to be constant, 
giving for it an equation similar to Eq. (144). So, the azimuthal functions are just the sine waves (146) 
again, and we can use the same periodicity condition (145) to write them in the normalized form 58 



KM 



1 



im<p 



(2*) 



1/2 



(3.159) 



With that, fraction (d F/dx )IF equals (-m ), and Eq. (158), after multiplication by 0/sin 6, is reduced to 
the following ordinary, linear differential equation for function 0(6*): 



1 



sin^ dO 



d ( . n d® 

sine' 

dO 



V 



+ ■ 



m 



sin 2 # 



-0 = £0, with s = EI 



K 2mR' j 



(3.160) 



It is convenient to recast it into an equation for a new variable P(^) = 0(6*), with £, = cos 0: 



d_ 
d$ 



dP 
d% 



+ 



1(1 + 1)- 



m~ 



l-{ 2 



P = 0, 



(3.161) 



where a new notation for the normalized energy is introduced: /(/+1) = s. The motivation for such 
notation is that, according to a mathematical analysis, 59 Eq. (161) with integer m, has solutions only if 
parameter / is integer: / = 0, 1,2,..., and only if that integer is not smaller than \m\, i.e. if 



-l<m<+l. 

This immediately gives the following energy spectrum of the 3D rotator: 




(3.162) 



Energy 
(3.163) spectrum 
v ' of the 

3D rotator 



58 Here, rather regrettably, I had to replace the notation of the integer from n to m, in order to comply with the 
generally accepted convention for this so-called magnetic quantum number. Let me hope that the difference 
between this integer and particle's mass is absolutely clear from the context. 

59 It was carried out by A.-M. Legendre (1752-1833). Just as a historic note: besides many original mathematical 
results, Dr. Legendre has authored the famous textbook Elements de Geometrie which dominated teaching 
geometry through the 1 9th century. 
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Legendre 
equation 



Rodrigues 
formula for 
Legendre 
polynomials 



so that the only effect of the magnetic quantum number m here is imposing the restriction (162) on the 
orbital quantum number I. This means, in particular, that each of energy level (163) corresponds to (21 + 
1) different values of m, i.e. is (21 + l)-degenerate. 

To understand the physics of this degeneracy, we need to explore the corresponding 
eigenfunctions of Eq. (161). They are naturally numbered by two integers, m and /, and are called the 
associated Legendre functions P{". For the particular, simplest case m = 0, these functions are just 
(Legendre) polynomials Pi(<%) = Pi (g~), which may be either defined as the solutions of the Legendre 
equation following from Eq. (161)atm = 0: 



(3.164) 




or calculated explicitly from the following Rodrigues formula: 60 



i d' 

2 l ll d% 



-(i; 2 -\y, / = 0, 1, 2, 



Using this formula, it easy to spell out a few lowest Legendre polynomials: 

p 0 (£)=i, m)=$, p 2 ({)=^& 2 -\] p 3 (£)=i(5£ 3 -3^..., 



(3.165) 



(3.166) 



though such expressions become more and more bulky as / is increased. As Fig. 18 shows, as argument 
£is decreased, all these functions start in one point, Pi(+l) = + 1, and end up either in the same point or 
in the opposite point: P/(-l) = (-1)'. On the way between these two end points, the I th polynomial crosses 
the horizontal axis exactly / times, i.e. has / roots. 61 It may be shown that on the segment [-1,+1], the 
Lagrange polynomials form a full orthogonal set of functions, with the following normalization rule: 



+i 



\P^)P V (^ 



-\ 



21 + 1 



(3.167) 



0.5 



0.5 























\ 




\x l = 


2 

= 1 







-0.5 0 0.5 

E, = cos 9 



Fig. 3.18. A few lowest Legendre polynomials. 



60 Derived independently by B. O. Rodrigues in 1816, J. Ivory in 1824, and C. Jacobi in 1827. 

61 In this behavior, we readily recognize the standing wave pattern typical for all ID eigenproblems 
The quantitative deviation from the sinusoidal waveform is due to the different metric of the sphere. 



cf. Fig. 1.7. 
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For m > 0, the associated Legendre functions may be expressed via the Legendre polynomials 
(165) using the following formula, which reminds Eq. (165): 



(£)=(-!)» a-r) 



2\m/2 



-m), 



while if the index m is negative, the following simple relation may be used: 




(3.168) 



(3.169) 



Associated 

Legendre 

functions 



On the segment <f = [-1, +1], each set of the associated Legendre functions with fixed index m forms a 
full orthogonal set, with the normalization relation, 



2 (l + m)\ 



(3.170) 



2/ + 1 (l-m)\ 

which is evidently a generalization of Eq. (167) for arbitrary m. 

Since the difference between angles 0 and <p is to some extent artificial (caused by the arbitrary 
direction of the polar axis), physicists prefer to use not the functions 0(6*) ccP"' (cosff) and F m {(p) <x 

expjz'm^} separately, but their products (157), which are called spherical harmonics: 



Yr(e,<p) 



(2/ + 1) (l-m)\ 
An (l + m)\ 



1/2 



p/"(cos#y' w r 



(3 1 71 Spherical 
^ " ' harmonics 



The specific coefficient in Eq. (171) is chosen in a way to simplify the following two relations: the 
equation for negative m, 



and the normalization relation 



Yr\e, ( p)={-\r\Yr(e, (P )\ , 



^Yr{e,< P )[Y l m \e, ( p)\dCL = 8 lv 8 mm ,, 



(3.172) 



(3.173) 



with integration over the whole solid angle An. The last relation shows that the spherical harmonics form 
an orthonormal set of functions. This set is also full, so that any function defined on a sphere may be 
uniquely presented as a linear combination of 7/". 

Despite a somewhat intimidating formulas given above, they yield rather simple expressions for 
the lowest spherical harmonics: 



1 = 0: Y 0 ° ={\/An) V2 ., 



1 = 1: 



Y; =-{3/Sn) U2 sin0e i<p , 
Y? = (3 / An j' 2 cos0, 

{3/Snf 2 sin0e 



(3.174) 



(3.175) 



r = + 



- i<p 
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1 = 2 



Y 2 2 =+(\5/32x) ll2 sm 2 ee 2l<p , 
Y\ = -(l5/8;r) 1/2 sin#cos#e^, 
7 2 ° =(3/16^-) 1/2 (3cos 2 ^-l), 
Y 2 l = +(l5/8;r) 1/2 sin#cos#e~^, 
Y- 2 =-{\5l32n) li2 sm 2 ee- 2l(p . 



(3.176) 



It is important to understand the symmetry of these functions. Since spherical functions with m ^ 
0 are complex, the most popular way of their graphical representation is first to form their real 
combinations corresponding to two opposite values of m, 62 



Y 



lm 



Y t m +sgn(m)(-l) m F ; - 



oc 



[cosm^, form>0, 
Isinm^, form<0, 



(3.177) 



(for m = 0, F/o = 7/ ), and then plot the magnitude of these combinations in spherical coordinates as the 
distance from the origin, while using two colors to show their sign - see Fig. 19. 



m = 0 



I = 1 (p states): 



1 = 2 (d states): 
m = -2 



Fig. 3.19. Several lowest real spherical 
harmonics 7/ m . (Adapted from Web site 
http://people.csail.mit.edu/sparis/ .) 




62 Such real functions Y lm , which also form the full set of orthonormal eigenfunctions and are frequently called the 
real spherical harmonics, are more convenient than the complex functions Yf for several applications, especially 
when the variables of interest are real by definition. 
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Let us starting from the simplest case / = 0. According to Eq. (162), there could be only one such 
s state, 63 with m = 0. The spherical harmonic corresponding to that state is just a constant, so that the 
wavefunction is uniformly distributed over the sphere. Since the functions does not have gradient in any 
direction, the kinetic energy (163) of the particle equals is zero. 

For 1=1, there could be 3 different p states, with m = -1, 0, and +1. As the second row in Fig. 19 
shows, these states are essentially identical in structure, and are just differently oriented in space, thus 
explaining the 3-fold degeneracy of the kinetic energy - see Eq. (163). This is not quite true for 5 
different d states (/ = 2), shown in the bottom row of Fig. 19, as well as states with higher /: despite their 
equal energies, they differ not only by their special orientation. The states with m = 0 have gradient only 
in the 6 direction, while the states with the ultimate values of m (m = ±1) change only gradually (as sin 7 6?) 
in the polar direction, while oscillating in the azimuthal direction. The states with intermediate values of 
m provide a crossover between these two extremes, oscillating in both directions, stronger and stronger 
in the direction of q> as \m\ is increased. Still, the magnetic quantum number, surprisingly, does not 
affect the energy for any /. Another surprising feature of the spherical harmonics follows from the 
comparison of Eq. (163) with the second of classical relations (152). These expressions coincide if we 
interpret constant 

L 2 =h 2 l{l + \), (3.178) 

as the value of the full angular momentum squared L 2 = I L p (including its both 6 and q> components) in 
the eigenstate with eigenfunction Yf\ On the other hand, the structure of the azimuthal component F(q>) 
of the wave function is exactly the same as in 2D axially-symmetric problems, suggesting that Eq. (139) 
still gives correct values (in our new notation, L z = mfi) for the z-component of the angular momentum. 

2 2 9 2 2 2 2 

If this is so, why for any state with / > 0, (L z ) = m fi < I fi is less than L = 1(1 + l)h ? In other words, 
what prevents the angular momentum vector to be fully aligned with axis z? 

Besides that issue, though the above analysis of the 3D rotator is formally (mathematically) 
complete, it is as unsatisfactory on the physics level as the harmonic oscillator analysis in Sec. 2.6. In 
particular, it does not explain the meaning of the extremely simple relations for eigenvalues of energy 
and angular momentum on the backdrop of rather complicated wavefunctions. 

We will obtain natural answers to all these questions and concerns in Sec. 5.6, but now let us 
complete our survey of wave mechanics by extending it to 3D motion in an arbitrary spherically- 
symmetric potential (155). In this case we have to use the full form of the Laplace operator in spherical 
coordinates. The variable separation procedure is an evident generalization of what we have done 
before, with the particular solution 

y/ = Z(p)®(0)F(<p), (3.179) 
whose substitution into the stationary Schrodinger equation yields 



fi- 



2mr 7 
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^ dr 



2 d^\ 1 1 
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dr J 0 sin# d6 



sm6 

dG 



+ ■ 



1 1 d 2 F 
sin 2 0 F dp 2 



+ U(r) = E. (3.180) 



63 The letter names for states with different values of / stem from the history of optical spectroscopy - for 
example, letter "s", used for / = 0, originally denoted the "sharp" optical line series, etc. The sequence of the 
letters is as follows: s, p, d,f,g, h, and further in the alphabetic order. 
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It is evident that the angular part (the two last terms in square brackets) separates from the radial 
part, and for the former part we get Eq. (156) again, with the only change, R — > r. This change does not 
affect the fact that the eigenfunctions of that equation are the spherical harmonics (171), and the angular 
eigenenergy is given by Eq. (163), again with the replacement R — » r. This means that for the radial 
function, Eq. (180) gives the following equation, 



tr 



]_d_ 
dr 



dr 



1(1 + 1) 



+ U(r) = E 



(3.181) 



Note that no information about the magnetic quantum number m has not crept into the radial equation 
(besides establishing the limitation (162) for possible values of I), so that this equation depends only on 
the latter quantum number. 

The radial equation becomes rather simple for U{r) = 0, and may be used, for example, to solve 
the eigenproblem for the free 3D motion of a particle inside the sphere of radius R. Leaving that problem 
for the reader's exercise, I will proceed to the most important Bohr atom problem, i.e. of motion in the 
so-called attractive Coulomb potential^ 



Attractive 
Coulomb 
potential 



U(r) 



C 
r 



with C > 0. 



The natural scales of r and E are, respectively, 65 



ti- 



mC 



and E n 



m 



In the normalized units s = E/E 0 and £,= r/r 0 , Eq. (181) looks simpler, 



d 2 



K 2d^ 
+ - 



-/(/ + 1K + 2 



£ + 



1 

1. 



^ = 0, 



(3.182) 



(3.183) 



(3.184) 



but unfortunately its eigenfunctions may be called elementary only in the most generous meaning of the 
word. With the adequate normalization, 



\^ n jK,;,r 2 dr = S n 
o 

these (mutually orthogonal) functions may be presented as 



(3.185) 



64 Historically, the solution of this problem in 1928, that reproduced the main result (1.8)-(1.9) of the "old" 
quantum theory developed by N. Bohr in 1912, without its restrictive assumptions, was the decisive step for the 
general acceptance of Schrodinger's wave mechanics. 

65 These two scales are obtained from relations E 0 = h 2 /mr 0 2 = C/r 0 , i.e. from the equality of the natural scales of 
the potential and kinetic energies, dropping all numerical coefficients. For the most important case of the 
hydrogen atom, C = e 2 IAnsQ, these scales are reduced, respectively, to the Bohr radius r B (1.13) and the Hartree 
energy En (1.9). Note also that for a hydrogen-like atom (or rather ion), with C = Z(e 2 /4ft£ 0 ), these two key 
parameters are rescaled as r 0 = r B /Z, E 0 = Z 2 £"h. We will use the last relations for our discussion of the helium 
atom in Sec. 8.2. 
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(3.186) 



Here V (E) are the so-called associated Laguerre polynomials, which may be calculated as 



Li({) = (-\yj^L p+q ({) 



(3.187) 



from simple Laguerre polynomials L P (E) = L p °(^). 66 In turn, the easiest way to obtain L p {£) is to use the 
following Rodrigues formula: 67 
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(3.188) 
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Notice that in contrast with the associated Legendre functions P™ participating in spherical harmonics, 
L p q are just polynomials, and those with small indices p and q are indeed simple. 

Returning to Eq. (186), we see that the natural quantization of the radial equation (184) has 
brought us a new quantum number (integer) n. In order to understand its range, we should notice that 
according to Eq. (188), the highest power of terms in polynomial L p+q is (p + q), and hence, according to 
Eq. (187), that of L p is p, so that of the highest power in the polynomial participating in Eq. (186) is (n 
- I - 1). Since the power cannot be negative (to avoid the unphysical divergence of wavefunctions at r 
— > 0), the radial quantum number n has to obey the restriction n > I + 1 . Since /, as we already know, 
may take values / = 0, 1, 2,. . ., we may conclude that n may only take values 



n = l,2, 



(3.189) 



What makes this relation important is the following, most surprising result of the theory: the 
eigenenergies corresponding to wavefunctions (179), which are indexed with 3 quantum numbers: 



depend only on n and agree with Bohr's formula (1.8): 



In' 



In' 



■ m 



(3.190) 



(3.191) 



Because of this reason, n is usually called the principal quantum number, and the above relation 
between it and "more subordinate" / is rewritten as 



l<n-\. 



(3.192) 



Together with inequality (162), this gives us the most important hierarchy of the 3 quantum 
numbers involved in the problem: 



1 < n < oo 



0</<n-l 



l<m<+L 



(3.193) 
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66 In Eqs. (187)-(188), /? and q are non-negative integers, with no relation whatsoever to particle's momentum or 
electric charge. Sorry for this notation, but it is absolutely common, and can hardly result in any confusion. 

67 Named after the same B. O. Rodrigues, and belonging to the same class as his another key result, Eq. (165). 
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Taking into account the (21 +l)-degeneracy related to the magnetic number m, and using the well-known 
formula for the arithmetic progression, 68 we see that each energy level (191) has the following orbital 
degeneracy: 



n-l 



g = £(2/ + l) 



(3.194) 



/=0 



Due to its importance for applications, let us spell out the quantum number hierarchy of a few lowest- 
energy states, using the traditional notation in which the value of n is followed by the letter that denotes 
the value of /: 



n = 1 : 1 = 0 (one Is state) m = 0 . 

n = 2: 1 = 0 (one 2s state) m = 0, 

1 = 1 (three 2p states) m = 0, ±1. 



n = 3 : 1 = 0 (one 3s state) 
/ = 1 (three 3 p states) 
1 = 2 (five 3d states) 



m = 0, 
m = 0, ± 1, 
m = 0,±l,+2. 



(3.195) 
(3.196) 

(3.197) 
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Figure 20 shows plots of the radial functions (186) of the listed states. The most important of 
them is of course the ground (Is) state with n = 1 and hence E = - E 0 /2, whose radial function (186) is 
just 

(3.198) 

and the angular distribution is uniform - see Eq. (174). The gap between the ground energy and the 
energy E = - Eq/S of the lowest excited states (with n = 2) in a hydrogen atom (in which Eq = Eu » 27.2 
eV) is as large as ~ 10 eV, so that their thermal excitation requires temperatures as high as ~10 5 K, and 
the overwhelming part of all hydrogen atoms in the visible Universe are in their ground state. Taking 
into account that atomic hydrogen makes up about 75% of the "normal" matter, we are very fortunate 
that such simple formulas as Eqs. (174) and (198) describe the systems most frequently met in Mother 
Nature! 69 



The radial functions of the next states, 2s and 2p, are also not too complex: 



1 



(2r„ r 



■rllr, 



l 



r 1 2r n 



(2/b) 



3/2 t1/2. 



(3.199) 



(Note again that the former of these states (2s) can only have a uniform angular distribution, while three 
2p states have different values of m = 0, ±1, and hence have different angular distributions - see Eq. 
(175) and the second row of Fig. 19.) The most important trend here is a larger radius of decay of the 
exponent (2rn for n = 2 instead of ro for n = 1), and hence the radial extension of the states. This trend is 
confirmed by the following general formula: 70 



68 See, e.g.,MAEq. (2.5a). 

69 Forgetting for a minute about such new "dark clouds" on the horizon of physics as the hypothetical dark matter 
and dark energy. 

70 Note that even at the largest value of /, equal to (n-l), term /(/ + 1) in Eq. (200) cannot compensate term 3«2. 
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(r) ni =^L[3n 2 -1(1 + 1)]. (3.200) 

The second important trend is that at fixed n, the orbital quantum number / determines how fast 
does the wavefunction change with r near the origin, and how much it oscillates in the radial direction at 
larger r. For example, the 2s eigenfunction ^zfiQ") is nonvanishing at r = 0, and makes one "wiggle" (has 
one root) in the radial direction, while eigenfunctions 2p equal zero at r = 0, and do not oscillate at all in 
the radial direction. Instead, those wavefunctions always oscillate as functions of some angle - see the 
second row of Fig. 19. The same trend in clearly visible for n = 3 (see Fig. 20), and continues for the 
higher values of n. 




Fig. 3.20. The lowest radial functions 
of the Bohr atom problem. 

0 2 4 6 8 10 




The interpretation of these results is that the states with / = / max = n — 1 may be viewed as analogs 
of the circular motion of a particle in a plane whose orientation defines the quantum number m, with an 
almost fixed radius r ~ r 0 (n ± n). On the other hand, the best classical image of an s-state (/ = 0) is the 
purely radial motion of the particle to and from the attracting center. (The latter image is especially 
imperfect, because the motion would need to happen simultaneously in all radial directions.) The 
classical language becomes reasonable only for the so-called Rydberg states, with n » 1, whose linear 
superpositions may be used to compose wave packets closely following the classical, circular or elliptic 
trajectories of the particle - just as was discussed in Sec. 2.2 for the free ID motion. 
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Besides Eq. (200), mathematics gives us several other simple relations for the radial functions 
^nj (and, since the spherical harmonics are normalized to 1, for the eigenfunctions as the whole), 
including those that we will use later in the course: 



In particular, the first of them means that for any eigenfunction y/ n ,i, m , with all its complicated radial and 
angular dependencies, there is a simple relation between the potential and full energies: 



so that the average kinetic energy of the particle, (T) n j = E n - (U) n ,i, is equal to I E n \ > 0. 

These simple results are in a sharp contrast with the rather complicated expressions for the 
eigenfunctions, and motivate a search for more general methods of quantum mechanics, which would 
replace or at least complement our brute-force (wave-mechanics) approach, to reveal their real nature. 
Such an approach will be the main topic of the next chapter. 



Before proceeding to that chapter, let me show that, rather strikingly, the classification of 
quantum numbers in the simple potential well (182), carried out in the last section, together with very 
modest borrowings from the further theory, allows an semi-quantitative explanation of the whole system 
of chemical elements. The "only" two additions we need are the following facts: 

(i) due to interaction with relatively low-temperature environments, atoms tend to relax into their 
lowest-energy state, and 

(ii) due to the Pauli principle (valid for electrons as Fermi particles), each orbital eigenstate 
discussed above can be occupied with 2 electrons with opposite spins. 

Of course, atomic electrons do interact, so that their quantitative description requires quantum 
mechanics of multiparticle systems, which is rather complex. (Its main concepts will be discussed in 
Chapter 8.) However, the lion's share of this interaction reduces to simple electrostatic screening, i.e. 
the partial compensation of the electric charge of the atomic nucleus, as felt by a particular electron, by 
other electrons of the atom. This screening changes the qualitative results (such as the energy scale) 
dramatically; however, the quantum number hierarchy, and hence their classification, is not affected. 

The system of atoms is most often presented as the famous periodic table of chemical elements, 11 
whose simple version is shown in Fig. 21, while Fig. 22 presents a sequential list of the elements with 
their electron configurations. The numbers in table's cells (and the first column in the list) are the 
atomic numbers Z, which physically are the numbers of protons in the atomic nucleus, and hence the 
numbers of electrons in the electrically neutral atom. The electron configuration in Fig. 22 follows the 



71 Also called the Mendeleev table of elements, after chemist D. Mendeleev who pioneered the concept in 1869. 
(The explanation of the underlying periodicity of chemical properties as functions of Z had to wait for 60 more 
years until the formulation of quantum mechanics in the late 1920s.) 




(3.201) 




(3.202) 



3.7. Atoms 
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convention already used in Eqs. (195)-(197), with the additional upper index showing the number of 
electrons with the indicated values of quantum numbers n and /. 

The lightest atom, with Z = 1, is hydrogen (chemical symbol H) - the only atom for each the 
theory discussed in Sec. 6 is quantitatively correct. 72 According to Eq. (191), the Is ground state of its 
only electron corresponds to quantum numbers n = 1, / = 0, and m = 0 - see Eq. (196). In most versions 
of the periodic table, the cell of H is placed in the top left corner. In the next atom, helium (He, Z = 2), 
the same orbital quantum state (Is) is filled with two electrons with different spins. 73 Note that due to 
the twice higher electric charge of the nucleus, i.e. the twice higher value of constant C in Eq. (182), 
resulting in a 4-fold increase of constant E 0 (183), the binding energy of each electron is crudely 4 times 
higher than that of the hydrogen atom - though the electron interaction decreases it by about 25% - see 
Sec. 7.2. This is why taking one electron away (i.e. positive ionization) of the helium atom requires a 
very high energy, 23.4 eV, which is not available in usual chemical reactions. On the other hand, a 
neural helium atom cannot bind one more electron (i.e. form a negative ion) either. As a result, helium, 
and all other elements with fully completed electron shells (sets of states with eigenenergies well 
separated from higher energy levels) is a chemically inert noble gas, thus starting the whole right-most 
column of the periodic table, committed to such elements. 
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Fig. 21. The periodic table of elements, showing their atomic numbers, as well as their basic 
physical/chemical properties at the so-called ambient (meaning usual laboratory) conditions. 



72 Besides very small "fme-structure" corrections - to be discussed in Chapters 6 and 9. 

73 As will be discussed in detail in Chapter 8, electrons of the same atom are actually indistinguishable, and their 
quantum states are not independent, and frequently entangled. These factors are important for several properties 
of helium atoms (and heavier elements as well), especially for their response to external fields. However, for the 
atom classification purposes, they are not crucial. 
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Fig. 3.22. Atomic electron configurations. The upper index shows the number of electrons in states with the 
indicated quantum numbers n (the first digit) and / (letter-coded as listed above). 
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The situation changes dramatically as we move to the next element, lithium (Li), with Z = 3 
electrons. Two of them are still accommodated by the inner shell n = 1 (listed in Fig. 22 as the helium 
shell [He]), but the third one has to reside in the next shell with n = 2 and / = 0, i.e. in the 2s state. 
According to Eq. (191), the binding energy of this electron is much lower, especially if we take into 
account that according to Eq. (200), the \s electrons of the [He] shell are much closer to the nucleus and 
almost completely compensate two thirds of its electric charge +3e. As a result, the 2s electron is 
reasonably well described by Eq. (199), with binding energy of just 5.39 eV, so that a lithium atom can 
give out that electron rather easily - to either atoms of other elements to form chemical compounds, or 
into the common conduction band of solid state lithium - and as a result it is a typical alkali metal. The 
similarity of chemical properties of lithium and hydrogen, with the chemical valence of one, 74 places Li 
as the starting element of the second period (row), with the first period limited to only H and He. 

In the next element, beryllium (Z = 4), the 2s state (n = 2, / = 0) picks up one more electron, with 
the opposite spin. Due to the higher electric charge of the nucleus, Q = 4e, with only half of it 
compensated by \s electrons of the [He] shell, the binding energy of the 2s electrons is higher than in 
lithium, so that the ionization energy increases to 9.32 eV. As a result, beryllium is also chemically 
active but not as active as lithium, with the valence of two, and is also is metallic in its solid state phase, 
but does not conduct electric current as well as lithium. 

Moving in this way along the second row of the periodic table (from Z = 3 to Z = 10), we see the 
gradual filling of all 4 different orbital states of the n = 2 shell, by 2 electrons each, with gradually 
growing ionization potential (up to 21.6 eV in Ne with Z = 10), i.e. the growing reluctance to have 
metallic conductance or form positive ions. However, the final elements of the row, such as oxygen (O, 
with Z = 8) and especially fluorine (F, with Z = 9) can readily pick up extra electrons to fill their 2p 
states, i.e. form negative ions. As a result, these elements are chemically active, with the double valence 
for oxygen and single valence for fluorine. However, the final element of this row, neon, has its n = 2 
shell full, and cannot form a stable negative ion. This is why it is a noble gas, like helium. Traditionally, 
in the periodic table it is placed right under helium (Fig. 21), to emphasize the similarity of their 
chemical and physical properties. But this necessitates making an at least 6-cell gap in the 1 st row. 
(Actually, the gap is often made larger, to accommodate next rows - keep reading.) 

Period 3, i.e. the 3 rd row of the table starts exactly like period 2, with sodium (Na, with Z = 11), 
also a chemically active alkali metal whose atom features 10 electrons filling shells with n = 1 and n = 2 
(in Fig. 22 collectively called the neon shell, [Ne]), plus one electron in a 3s state (n = 3, / = 0, m = 0), 
which may be reasonably well described by the hydrogen atom theory - see, e.g., the red trace on the 
last panel of Fig. 20. Naively we could expect that, according to Eq. (194), and with the account of 

2 2 

double spin degeneracy, this period of the table should have 2n = 2x3 =18 elements, with gradual 
filling of two 3s states, six 3p states, and ten 3d states. However, here we run into a big surprise: after 
argon (Ar, with Z = 18), a relatively inert element with ionization energy of 15.7 eV due to the fully 
filled 3s and 3p states, the next element, potassium (K, with Z= 19) is an alkali metal again! 

The reason for that is the difference of the actual electron energies from those of the hydrogen 
atom, which is due mostly to inter-electron interactions and gradually accumulates with the growth of Z. 
It may be semi-quantitatively understood from the results of Sec. 6. In hydrogen-like atoms, electron 
state energies do not depend on the quantum number / (as well as m) - see Eq. (191). However, the 



74 Chemical valence is a relatively vague term describing the number of atom's electrons involved in chemical 
reactions. For the same atom, the number may depend on the chemical compound formed. 
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orbital quantum number does affect the wavefunction of an electron. As Fig. 20 shows, the larger / the 
less the probability for an electron to be close to the nucleus, where its positive charge is less 
compensated by other electrons. As a result of this effect (and also the relativistic corrections to be 
discussed in Sec. 6.3), electron's energy grows with /. Actually, this effect was visible even in period 2: 
it manifests itself in the filling order (p states after s states). However, for potassium (K, with Z = 19) 
and calcium (Ca, with Z = 20), energies of 3d states become so high that energies of two 4s states (with 
opposite spins) are lower, and they are filled first. As described by factor 3 in the square brackets of Eq. 
(200), and also by Eq. (201), the effect of the principal number n on the distance from the nucleus is 
stronger than that of / < n, so that 4s wavefunctions of K and Ca are relatively far from the nucleus, and 
determine the chemical valence (equal to 1 and 2, correspondingly) of these elements. The next atoms, 
from Sc (Z = 21) to Zn (Z = 30), with the gradually filled "internal" 3d states, are the so-called 
transition metals whose (comparable) ionization energies and chemical properties are determined by 4s 
electrons. 

This fact is the origin of the difference between various forms of the "periodic" table. In its most 
popular option, shown in Fig. 21, K is used to start the next, period 4, and then a new period is started 
each time and only when the first electron with the next principal quantum number (n) appears. 75 This 
topology provides a very clear mapping on the chemical properties of the first element of each period 
(an alkali metal), as well as its last element (a noble gas). This also automatically means making gaps in 
all previous rows. Usually, this gap is made between the atoms with completely filled s states and with 
the first electron in a p state, because here the properties of the elements make a somewhat larger step. 
(For example, the step from Be to B makes the material an insulator, but it is not large enough to make a 
similar difference between Mg to Al.) As a result, elements of the same column have approximately 
similar chemical valence and physical properties. 

However, to accommodate longer lowest rows, such presentation is inconvenient, because the 
whole table would be too broad. This is why the so-called rare earths, including lanthanides (with Z 
from 57 to 70, of the 6 th row, with gradual filling of 4f and 5d states) and actinides (Z from 89 to 103, of 
the 7 th row, with gradual filling of 5f and 6d states), are presented as outlet lines (Fig. 21). This is quite 
acceptable for the purposes of standard chemistry, because chemical properties of elements within each 
group are rather close. 

To summarize, the "periodic table of elements" is not periodic in the strict sense of the word. 
Nevertheless, it has had an enormous historic significance for chemistry, as well as atomic and solid 
state physics, and is still very convenient for many purposes. For our course, the most important aspect 
of its discussion is the surprising possibility to describe, at least for classification purposes, such a 
complex multi-electron system as an atom as a set of quasi-independent electrons in certain quantum 
states indexed with the same quantum numbers n, I, and m as those of the hydrogen atom. This fact 
enables the use of various perturbation theories, which give more quantitative description of atomic 
properties. Some of these techniques will be reviewed in Chapters 6 and 8 of this course. 76 



75 Another option is to return to the first column as soon an atom has one electron in s state (like it is in Cu, Ag, 
and Au, in addition to the alkali metals). 

76 For a bit more detailed (but still very succinct) discussion of valence and other chemical aspects of atomic 
structure, I can recommend Chapter 5 of the classical text by L. Pauling, General Chemistry, Dover, 1988. 
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3.8. Exercise problems 



3.1 . Use the Born approximation to calculate the angular dependence and the full cross-section 
of scattering of an incident plane wave, propagating along axis x, by the following pair of point 
inhomogeneities: 



U(r) = W 



r -n 



+ S 



<J 



r + n. 



v 



<J 



Analyze the results in detail. Derive the condition of the Born approximation's validity for such delta- 
functional scatterers. 



3.2 . Use the Born approximation to calculate the differential and full cross-sections of a spherical 
scatterer: 



\U 0 , for r < R, 
0, otherwise. 



U(T) = 

Analyze both results, especially the angular dependence of dddQ,, in detail, for kR « 1 and kR »1. 



3.3 . Reformulate the Born approximation for the ID case. Use the result to find the scattering 
and transfer matrices of a "rectangular" scatterer 



U(x) = 



\U 0 , for \x\< d/2, 



0, otherwise. 

Compare the results with the those of the exact calculations, carried out earlier. 



3.4 . For the 2D hexagonal lattice (Fig. 1 lb): 

(i) find the reciprocal lattice Q and the 1 st Brillouin zone; 

(ii) use the tight-binding approximation to calculate the dispersion relation E(q) for a 2D particle 
moving in a potential with such periodicity, close to the eigenenergy of an axially-symmetric state 
quasi-localized at the potential minima; 

(iii) analyze and sketch (or plot) the resulting dispersion relation E(q) inside the 1 st Brillouin 

zone. 



3.5 . Complete the tight-binding approximation calculation of band structure of the honeycomb 
lattice, started in the end of Sec. 4. Analyze the results. Prove that the Dirac points q D are located in the 
corners of the 1 st Brillouin zone, and express velocity v„, participating in Eq. (122), in terms of the 
coupling energy S n . Show that the final results do not change if the quasi-localized wavefunctions are 
not axially-symmetric, but are proportional to exp{in<p} - as they are, with n= 1, for the 2p z electrons of 
carbon atoms in graphene, which are responsible for its transport properties. 
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3.6 . Examine basic properties of the so-called Wannier functions defined as 

^ R (r) = const x jV q (r)e~' q ' R <i 3 <7 , 

BZ 

where y/ q (r) is the Bloch wavefunction (3.108), R is any vector of the Bravais lattice, and the integration 
over quasi-momentum q is extended over any (e.g., first) Brillouin zone. 

3.7 . Evaluate the long-range electrostatic interaction (the so-called London dispersion force) 
between two similar, electrically-neutral but polarizable molecules, modeling them as isotropic 3D 
harmonic oscillators. 

Hint: Using the classical expression for the interaction between two electric dipoles, 77 try to 
present the total Hamiltonian of the system as a sum of Hamiltonians of several independent harmonic 
oscillators, and calculate their ground-state energy as a function of distance between the molecules. 

3.8 . Use the variable separation method to find expressions for the eigenfunctions and the 
corresponding eigenenergies of a free 2D particle confined inside a thin round disk of radius R: 

U _\Q> for 0 < < R, 
[ + oo, fori? < p, 

where p = {x, y, 0}. What is the level degeneracy? Calculate 5 lowest energy levels with accuracy better 
than 1%. 

3.9 . Spell out the explicit form of spherical harmonics Y®{6,(p) and Y*{6,(p) . 

3.10 . Calculate (x) and (x ) in the ground state of the 2D and 3D rotators (i.e. quantum particles 
free to move, respectively, along a plane ring of radius R and on a spherical surface of radius R). What 
can you say about averages (p x ) and (p x )? 

3.11 . Find eigenfunctions and the energy spectrum of a 3D particle free to move inside a sphere 
of radius R: 

fO, for 0 < r < R, 
U = \ 

[+oo, fori?<r. 

Calculate 5 lowest energy levels with 1% accuracy, and indicate the degeneracy for each of them. 

3.12 . Calculate the lifetime of the lowest metastable state in the spherical-shell potential 



77 See, e.g., EM Sec. 3.1. 
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U(r) = WS(r-R), W>0, 
in the limit of large W. Specify the limit of validity of your result. 

3.13 . Discuss the effect of finite proton's mass m p on the eigenenergy spectrum of the hydrogen 

atom. 
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Chapter 4. Bra-ket Formalism 



The objective of this chapter is a discussion of Dirac 's bra-ket formalism of quantum mechanics, which 
not only overcomes some inconveniences of wave mechanics, but also allows a natural description of 
such "internal" properties of particles as their spin. In the course of discussion of the formalism I will 
give several simple examples of its use, leaving more involved applications for the following chapters. 



We have seen that wave mechanics gives many results of primary importance. Moreover, it is 
fully (or mostly) sufficient for many applications, for example, for solid state electronics and device 
physics. However, in the course of our survey we have filed several grievances about this approach. Let 
me briefly summarize these complaints: 

(i) Wave mechanics is focused on the spatial dependence of wavefunctions. On the other hand, 
our attempts to analyze the temporal evolution of quantum systems within this approach (beyond the 
trivial time behavior of the eigenfunctions, described by Eq. (1.61)), run into technical difficulties. For 
example, we could derive Eq. (2.159) describing time dynamics of the metastable state, or Eq. (2.185) 
describing quantum oscillations in coupled wells, only for the simplest potential profiles, though it is 
intuitively clear that these simple results should be common for all problem of this kind. Deriving the 
equations of such processes for arbitrary potential profiles is possible using perturbation theories (to be 
reviewed in Chapter 6), but that in the wave mechanics language they would require very bulky 
formulas. 

(ii) The same is true concerning other issues that are conceptually addressable within wave 
mechanics, e.g., the Feynman path integral approach, description of coupling to environment, etc. 
Addressing them in wave mechanics would lead to formulas so bulky that I had (wisely :-) postponed 
them until we have got a more compact formalism on hand. 

(iii) In the discussion of several key problems (for example the harmonic oscillator and 
spherically-symmetric potentials) we have run into rather complicated eigenfunctions coexisting with 
simple energy spectra - that infer some simple background physics. It is very important to get this 
physics revealed. 

(iv) In the wave mechanics postulates, formulated in Sec. 1.2, quantum mechanical operators of 
the coordinate and momentum are treated very unequally - see Eqs. (1.26b). However, some key 
expressions, e.g., for the fundamental eigenfunction of a free-particle, 



4.1. Motivation 




(4.1) 



or the harmonic oscillator's Hamiltonian, 



H 




(4.2) 



invite a similar treatment of momentum and coordinate. 



© 2013 K. Likharev 



Open online access under cc bv-nc-sa license 



Essential Graduate Physics 



QM: Quantum Mechanics 



However, the strongest motivation for a more general formalism comes from the fact that wave 
mechanics cannot describe elementary particle's spin and other internal quantum degrees of freedom, 
such as quark flavors or lepton numbers. In this context, let us review the basic facts on spin (which is a 
very representative and experimentally the most accessible of all internal quantum numbers), so we 
would be aware of what a more general formalism should explain - as a minimum. 

Figure 1 shows the conceptual scheme of the simplest spin-revealing experiment, first carried 
out by O. Stern and W. Gerlach in 1922. 1 A collimated beam of electrons is passed through a gap 
between poles of a strong magnet, where the magnetic field 3, whose orientation is taken for axis z in 
Fig. 1, is non-uniform, so that both 3- and d&Jdz are not equal to zero. As a result, the beam splits into 
two parts of equal intensity. 



collimator 




electron 
source 



W = 50% 

W = 50% 
particle detectors 



Fig. 4.1. The simplest Stern- 
Gerlach experiment. 



This simplest experiment can be semi-quantitatively explained on classical, though somewhat 
phenomenological grounds by assuming that each electron has an intrinsic, permanent magnetic dipole 
moment m. Indeed, classical electrodynamics 2 tells us that the potential energy U of a magnetic dipole 
in an external magnetic field is equal to (-m ■ 3), so that the force acting on the particle, 

F = -V[/ = -V(-m-2), (4.3) 

has a nonvanishing vertical component 

F z =--(-m z -3 z ) = m z —^. (4.4) 
oz oz 

Hence if we further postulate the existence of two possible, discrete values of m z = ±ju, this 
explains the Stern-Gerlach effect qualitatively, as a result of the incident electrons having a random 
sign, but similar magnitude of m z . A quantitative explanation of the beam splitting angle requires ju to 
be equal (or close) to the so-called Bohr magneton 1 ' 



// B = — * 0.9274 xl(T 23 - 
B 2m T 



(4.5) 



Bohr 

magneton 



As we will see below, this value cannot be explained by any internal motion of the electron, say its 
rotation about axis z. 



1 To my knowledge, the concept of spin as an internal rotation of a particle was first suggested by R. Kronig, then 
a 20-year-old student, in January 1925, a few months before two other students, G. Uhlenbeck and S. Goudsmit - 
to whom the idea is usually attributed. The concept was then accepted and developed quantitatively by W. Pauli. 

2 See, e.g., EM Sec. 5.4, in particular Eq. (5.100). 

3 A convenient mnemonic rule is that it is close to 1 K7T. In the Gaussian units, // B = tie/2m e c « 0.9274x1 0" 20 . 



Chapter 4 



Page 2 of 40 



Essential Graduate Physics 



QM: Quantum Mechanics 



Much more importantly, this semi-classical language cannot explain the results of the following 
set of multi-stage Stern-Gerlach experiments, shown in Fig. 2 - even qualitatively. In the first of the 
experiments, the electron beam is first passed through a magnetic field oriented (together with its 
gradient) along axis z, just as in Fig. 1 . Then one of the two resulting beams is absorbed (or otherwise 
removed from the setup), while the other one is passed through a similar but x-oriented field. The 
experiment shows that this beam is split again into two components of equal intensity. A classical 
explanation of this experiment would require a very unnatural suggestion that the initial electrons had 
random but discrete components of the magnetic moment simultaneously in two directions, z and x. 

However, even this assumption cannot explain the results of the three-stage Stern-Gerlach 
experiment shown on the middle panel of Fig. 2. Here, the previous two-state setup is complemented 
with one more absorber and one more magnet, now with the z-orientation again. Completely counter- 
intuitively, it again gives two beams of equal intensity, as if we have not yet filtered out the electrons 
with m z corresponding to the lower beam, in the first, z-stage. 




D-0 50% 
=HZf 50% 



absorber 




Zh0 50% 
=HZf 50% 









100% 




SG 












— ► 


► 


SG 






► 




— ► 




(?) 














► 











^0 0% 



Fig. 4.2. Three multi-stage 
Stern-Gerlach experiments. 
Boxes SG (...) denote 
100% magnets similar to one 
shown in Fig. 1, with the 
axis oriented in the 
indicated direction. 



The only way to save the classical explanation here is to say that maybe, electrons somehow 
interact with the magnetic field, so that the x-polarized (non-absorbed) beam becomes spontaneously 
depolarized again somewhere between magnetic stages. But any hope for such explanation is ruined by 
the control experiment shown on the bottom panel of Fig. 2, whose results indicate that no such 
depolarization happens. 

We will see below that all these (and many more) results find a natural explanation in the matrix 
mechanics pioneered by W. Heisenberg, M. Born and P. Jordan in 1925. However, the matrix formalism 
is inconvenient for the solution of most problems discussed in Chapters 1-3, and for a time it was 
eclipsed by Schrodinger's wave mechanics, which had been put forward just a few months later. 
However, very soon P. A. M. Dirac introduced a more general bra-ket formalism, which provides a 
generalization of both approaches and proves their equivalence. Let me describe it. 
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4.2. States, state vectors, and linear operators 

The basic notion of the general formulation of quantum mechanics is the quantum state of a 
system. 4 To get some gut feeling of this notion, if a quantum state a of a particle may be adequately 
described by wave mechanics, this description is given by the corresponding wavefunction Y^r, t). 
Note, however, the state as such is not a mathematical object (such as a function), 5 and can participate in 
mathematical formulas only as a "pointer" - e.g., the index of function W a . On the other hand, the 
wavefunction is not a state, but a mathematical object (a complex function of space and time) giving a 
quantitative description of the state - just as the radius-vector as a function of time is a mathematical 
object describing the motion of a classical particle - see Fig. 3. Similarly, in the Dirac formalism a 
certain quantum state a is described by either of two mathematical objects, called the state vectors: the 
ket-vector \a) and bra-vector (a |. 6 

One should be cautions with the term "vector" here. Usual "geometric" vectors are defined in the 
usual geometric (say, Euclidean) space. In contrast, bra- and ket-vectors are defined in abstract Hilbert 
spaces of a given system, 7 and, despite certain similarities with the geometric vectors, are new 
mathematical objects, so that we need new rules for handling them. The primary rules are essentially 
postulates and are justified only by the fact that their corollaries give correct description (or prediction) 
of all experimental observations, while more complex rules may be derived from them. While these is a 
general consensus among physicists what the rules should be, there are many possible ways to carve 
from them the set of basic postulates. Just as in Sec. 1.2, 1 will not try too hard to beat the number of the 
postulates to the smallest possible minimum, trying instead to keep their physical meaning transparent. 




mathematical 
description 



classical mechanics : r(t) 

* 

-> wave mechanics: either x ¥ a (r,t) or x ¥ a (r,t) 



bra - ket formalism : either | a) or (a | 
Fig. 4.3. Particle's state and its descriptions. 



(i) Ket-vectors. Let us start with ket-vectors - sometimes called just kets for short. Perhaps the 
most important property of the vectors concerns their linear superposition. Namely, if several ket- 
vectors \aj) describe possible states of a quantum system, then any linear combination (superposition) 



a 



-Z 



c.\a. 



(4.6) 



Linear 

superposition 
of ket-vectors 



4 An attentive reader could notice my smuggling term "system" instead of "particle" which was used in the 
previous chapters. Indeed, the bra-ket formalism allows the description of quantum systems much more complex 
than a single spinless particle that is a typical (though not the only possible) subject of wave mechanics. 

5 As was expressed nicely by A. Peres, one of pioneers of the quantum information theory, "quantum phenomena 
do not occur in the Hilbert space, they occur in a laboratory". 

6 The terms "bra" and "ket" stem from the fact that two vectors, say \a) and (/? I , may be considered as parts of 
combination ((3 \ a) (see below) which reminds an expression in angle brackets. 

7 The Hilbert space of a given system is defined as the set of all its possible state vectors. As should be clear from 
this definition, it is not advisable to speak about a "Hilbert space of quantum states". 
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where cj are any (possibly complex) c-numbers, also describes a possible state of the same system. (One 
may say that vector \ a) belongs to the same Hilbert space as all \oCj).) Actually, since ket-vectors are new 
mathematical objects, the exact meaning of the right-hand part of Eq. (6) becomes clear only after we 
have postulated the following rules of summation of these vectors, 



a i ) + \a„) = \a „) + \a 



and their multiplication by c-numbers: 



c, a. = \a 



(4.7) 



(4.8) 



Note that in the set of wave mechanics postulates, statements parallel to (7) and (8) were unnecessary, 
because wavefunctions are the usual (albeit complex) functions of space and time, and we know from 
the usual algebra that such relations are valid. 

As evident from Eq. (6), the complex coefficient Cj may be interpreted as the "weight" of state aj 
in the linear superposition a, but the generally accepted term for cj is the probability amplitude (or just 
the amplitude). One important particular case is Cj = 0, showing that state ctj does not participate in the 
superposition a. By the way, the corresponding term of sum (6), i.e. product 



Null-state 
vector 



0 



a 



(4.9) 



has a special name: the null-state vector. (It is important to avoid confusion between the null-state 
corresponding to vector (9), and the ground state of the system, which is frequently denoted by ket- 
vector |0). In some sense, the null-state does not exist at all, while the ground state does - and frequently 
is the most important quantum state of the system.) 

(ii) Bra-vectors and inner ("scalar") products. Bra-vectors <a|, which obey the rules similar to 
Eqs. (7) and (8), are not new, independent objects: if a ket-vector \a) is known, the corresponding bra- 
vector (a\ describes the same state. In other words, there is a unique dual correspondence between \a) 
and <a|, 8 very similar (though not identical) to that between a wavefunction and its complex conjugate 
x ¥*. The correspondence between these vectors is described by the following rule: if a ket-vector of a 
linear superposition is described by Eq. (6), then the corresponding bra-vector is 



Linear 
superposition 
of bra-vectors 



Inner 
bra-ket 
product 



(a 


j 


j 


* 

C J ■ 



(4.10) 



The mathematical convenience of using two types of vectors, rather than just one, becomes clear 
from the notion of their inner product (also called the short bracket): 



a 



(4.11) 



This is a (generally, complex) 9 scalar, whose main property is the linearity with respect to any of its 
component vectors. For example, if a linear superposition a is described by the ket-vector (6), then 



8 Mathematicians like to say that the ket- and bra-vectors of the same quantum system are defined in two 
isomorphic Hilbert spaces. 

9 This is one of the differences of bra- and ket-vectors from the usual (geometrical) vectors whose scalar product 
is always a real scalar. 
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(/%) = Ec,(/?|«A, 



a \P) =y L c *( a j\P 



while if Eq. (10) is true, then 



In plain English, c-numbers may be moved either into, or out of the inner products. 
The second key property of the inner product is 



(4.12) 



(4.13) 



a 



a 



It is compatible with Eq. (10); indeed, the complex conjugation of both parts of Eq. (12) gives: 

W = I>v>h)* = !>*(«» = (a\0) . 



(4.14) 



(4.15) 



Inner 
product's 
complex 
conjugate 



Finally, one more rule: the inner product of the bra- and ket-vectors describing the same state 
(called the norm squared) is real and non-negative, 



a\\ =(a\a)>0 



(4.16) 



State's 

norm 

squared 



In order to give the reader some feeling about the meaning of this rule: we will show below that if state 
a may be described by wavefunction ^^(r, t), then 

* 



or | or) = ]y a ¥ a d 3 r >0 



(4.17) 



Hence the role of the bra-ket is very similar to the complex conjugation of the wavefunction, and Eq. 
(10) emphasizes this similarity. (Note that, by convention, there is no conjugation sign in the bra-part of 
the inner product; its role is played by the angular bracket inversion.) 

(iii) Operators. One more key notion of the Dirac formalism are quantum-mechanical linear 
operators. Just as for the operators discussed in wave mechanics, the function of an operator is the 

"generation" of one state from another: if \a) is a possible ket of the system, and A is a legitimate 
operator, then the following combination, 



Ala 



(4.18) 



is also a ket-vector describing a possible state of the system, i.e. a ket-vector in the same Hilbert space 
as the initial vector \a). As follows from the adjective "linear", the main rules governing the operators is 
their linearity with respect to both any superposition of vectors: 

f \ 



A 



(4.19) 



J J 



and any superposition of operators: 



2>A \ a )=H c Aj\ a 



(4.20) 



V J 
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These rules are evidently similar to Eqs. (1.53)-(1.54) of wave mechanics. 

The above rules imply that an operator "acts" on the ket-vector on its right; however, a 
combination of the type (a\A is also legitimate and presents a new bra-vector. It is important that, 

generally, this vector does not represent the same state as ket-vector (18); instead, the bra-vector 
isomorphic to ket-vector (18) is 



Hermitian 
operator's 
definition 



Long 
bracket's 
complex 
conjugate 



Hermitian 
conjugate 
operator 



(4.21) 



« J. 

This statement serves as the definition of the Hermitian conjugate (or "Hermitian adjoint")^' of the 

initial operator A . For an important class of operators, called the Hermitian operators, the conjugation 
is inconsequential, i.e. for them 



i 1 " =A 



(4.22) 



(This equality, as well as any other operator equation below, means that these operators act similarly on 
any bra- or ket-vector.) 10 

To proceed further, we need an additional postulate, called the associative axiom of 
multiplication: into any legitimate bra-ket expression, 11 not including an explicit summation, we may 
insert or remove parentheses (just in the ordinary product of scalars), meaning as usual that the 
operation inside the parentheses is performed first. The first two examples of this postulate are given by 
Eqs. (19) and (20), but the associative axiom is more general and says, for example: 



Long 
bracket 



J3\(2\a))= [(P\A)\a) - {p\A\a 



(4.23) 



This equality serves as the definition of the last form, called the long bracket (evidently, also a scalar), 
with an operator sandwiched between a bra-vector and a ket-vector. This definition, when combined 
with the definition of the Hermitian conjugate and Eq. (14), yields an important corollary: 



(f3\A\a) = (f3\{A\a)) = 
which is most frequently rewritten as 



a\A^ 



= (a 



a\A\py ={p\A^\a 



(4.24) 



(4.25) 



The associative axiom also enables to readily explore the following definition of one more, outer 
product of bra- and ket-vectors: 



10 c-numbers may be also considered a particular type of operators, whose action is limited to the change of 
state's probability amplitude. According to Eqs. (1 1) and (21), for them the Hermitian conjugation is equivalent to 
the simple complex conjugation, so that only a real c-number may be considered as a (particular case of) 
Hermitian operator (22). 

1 1 Here "legitimate" means "having a clear sense in the bra-ket formalism". Some examples of "illegitimate" 
expressions: \a) A, A (a\, \a) (cA(P\. Note, however, that the last two expressions may be legitimate if a and /? 
are states of different systems, i.e. if their state vectors belong to different Hilbert spaces. We will run into such 
tensor products of bra- and ket vectors (sometimes denoted, respectively, as \a)®\P) and (a|®(/^) in Chapters 6-8. 
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a\ 



(4.26) 



Outer 

bra-ket 

product 



In contrast to the inner product (12), which is a scalar, this mathematical construct is an operator. 
Indeed, the associative axiom allows us to remove parentheses in the following expression: 

(\j3){a\]y) = \/3)(a\y). (4.27) 

But the last bra-ket is just a scalar; hence the mathematical object (26) acting on a ket-vector (in this 
case, \y)) gives a new ket-vector, which is the essence of operator's action. Very similarly, 

(Sp)(a\)=(S\p)(a\ (4.28) 

- again a typical operator's action on a bra-vector. 

Now let us perform the following calculation. We may use the parentheses insertion into the bra- 
ket equality following from Eq. (14), 



y\a){j3\8) = ({8\p){a\y))*, 



to transform it to the following form: 



ria)(p^\S) = ({S\i\p){aM 



(4.29) 



(4.30) 



Since this equation should be valid for any vectors (y\ and | ft), its comparison with Eq. (25) gives the 
following operator equality 







J3)(a 





(4.31) 



Outer 
product's 
Hermitian 
conjugate 



This is the conjugate rule for outer products; it reminds rule (14) for inner products, but involves the 
Hermitian (rather than the usual complex) conjugation. 



The associative axiom is also valid for the operator "multiplication": 

(AB\a) = A(B\a)} {j3\{ab)= ((0\a)b , 



(4.32) 



showing that the action of an operator product on a state vector is nothing more than the sequential 
action of the operands. However, we have to be rather careful with the operator products; generally they 
do not commute: AB ^ BA . This is why the commutator, the operator defined as 







A,B 


= AB - BA , 


(4.33) 


is a very useful option. Another similar notion is the anticommutator. 12 




\ 


a,b) 


= AB + BA . 


(4.34) 



Commutator 



Anti- 
commutator 



Finally, the bra-ket formalism broadly uses two special operators: the null operator 6 defined 
by the following relations: 



12 Another popular notation for the anticommutator is 



A,B 



; it will not be used in these notes. 
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Null 
operator 



0\a) = 0\a), (aO = (aO 



(4.35) 



for an arbitrary state or, we may say that the null operator "kills" any state, turning it into the null-state. 
Another elementary operator is the identity operator, which is also defined by its action (or rather 
"inaction" :-) on an arbitrary state vector: 



Identity 
operator 



l\a) = \a), (a\l = (a 



(4.36) 



4.3. State basis and matrix representation 

While some operations in quantum mechanics may be carried out in the general bra-ket 
formalism outlined above, most calculations are done for specific quantum systems that feature at least 
one full and orthonormal set {u} of states Uj, frequently called a basis. These terms mean that any state 



Expansion 
over 
basis 
vectors 



Basis 
vectors' 
ortho- 
normality 



vector of the system may be re 


presented as a unique sum of the type (6) or 




j 


Uj), (a 


=H a *( u J 

j 


5 



(so that, in particular, if a is one of the basis states, say Uy, then «, = Sjj) , and that 



Uj u . 



(4.37) 



(4.38) 



For the systems that may be described by wave mechanics, examples of the full orthonormal bases are 
represented by any orthonormal set of eigenfunctions calculated in the previous 3 chapters - as the 
simplest example, see Eq. (1.76). 

Due to the uniqueness of expansion (37), the full set of coefficients Oj gives a complete 
description of state a (in a fixed basis {«}), just as the usual Cartesian components A x , A y , and A z give a 
complete description of a usual geometric 3D vector A (in a fixed reference frame). Still, let me 
emphasize some differences between the quantum-mechanical bra- and ket-vectors and the usual 
geometric vectors: 

(i) a basis set may have a large or even infinite number of states uj, and 

(ii) the expansion coefficients «, may be complex. 

With these reservations in mind, the analogy with geometric vectors may be pushed even further. 
Let us inner-multiply both parts of the first of Eqs. (37) by a bra-vector (uj\ and then transform the 
relation using the linearity rules discussed in the previous section, and Eq. (38): 



Uj, a 



u r\Z a j\ u j) = H a j{ u 



. , \u ■ 
j \ j 



a, 



(4.39) 



Expansion 
coefficients 
as inner 
products 



Together with Eq. (14), this means that any of the expansion coefficients in Eq. (37) may be presented 
as an inner product: 











a), a t = (a 


uj); 



(4.40) 



these relations are analogs of equalities Aj = n y -A of the usual vector algebra. Using these important 
relations (which we will use on numerous occasions), expansions (37) may be rewritten as 
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a ) = Z \ u j ){ u j I a ) s Z A; I a >» H = Z ( a k ) k 1 = Z < a l A ; > 



(4.41) 



A comparison of these relations with Eq. (26) shows that the outer product defined as 











Uj)(uj 








9 



(4.42) 



Projection 
operator 



is a legitimate linear operator. Such an operator, acting on any state vector of the type (37), singles out 
just one of its components, for example, 



A ■ <ar) = \ Uj )(uj \ a) = aAu. 



(4.43) 



i.e. kills all components of the linear superposition but one. In the geometric analogy, such operator 
"projects" the state vector on its (j ) "direction", hence its name - the projection operator. Probably, the 
most important property of the projection operators, called the closure (or completeness) relation, 
immediately follows from Eq. (41): their sum over the full basis is equivalent to the identity operator: 



Zk)( M 7 



(4.44) 



Closure 
relation 



This means in particular that we may insert the left-hand part of Eq. (44) into any bra-ket relation, at any 
place - the trick that we will use again and again. 

Let us see how expansions (37) transform all the notions introduced in the last section, starting 
from the short bra-ket (11) (the inner product of two state vectors): 



P\a) = \p*a J \u j ) = = ZA*°7- 



(4.45) 



Besides the complex conjugation, this expression is similar to the scalar product of the usual vectors. 
Now, let us explore the long bra-ket (23): 



p\A\a) = X Pj( Uj \A\u f )a r = ^* A M' a j 



(4.46) 



Here, the last step uses a very important notion of matrix elements of the operator, defined as 









A JB ={"< 


A 


Uj ). 









{AM) 



Operator's 

matrix 

elements 



As evident from Eq. (46), the full set of the matrix elements completely characterizes the operator, just 
as the full set of expansion coefficients (40) fully characterizes a quantum state. The term "matrix" 
means, first of all, that it is convenient to present the full set of Ajj- as a square table {matrix), with the 
linear dimension equal to the number of basis states Uj of the system under the consideration, i.e. the size 
of its Hilbert space. 

As two simplest examples, all matrix elements of the null-operator, defined by Eqs. (35), are 
evidently equal to zero (in any basis), and hence it may be presented as a matrix of zeros (the null 
matrix): 





^0 0 






0 = 


0 0 








v 


...J 





(4.48) 



Null 
matrix 
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while for the identity operator / , defined by Eqs. (36), we readily get 

I jr =(u J \l\u f ) = (u J \u r ) = S jf , 
i.e. its matrix (called the identity matrix) is diagonal - also in any basis: 



(4.49) 



Identity 
matrix 





'1 0 






1 = 


0 1 








v 


...j 





(4.50) 



The convenience of the matrix language extends well beyond the presentation of particular 
operators. For example, let us use definition (47) to calculate matrix elements for a product of two 
operators: 

(AB) fr =(u l \AB\u r 



Matrix 
element 
of an 
operator 
product 



(4.51) 

Here we can use Eq. (44) for the first (but not the last!) time, inserting the identity operator between the 
two operators, and then expressing it via a sum of projection operators: 



(4.52) 



This result corresponds to the standard "row by column" rule of calculation of an arbitrary element of 
the matrix product 




AB 



A u A n 
A A 



"21 B 22 



(4.53) 



Hence the product of operators may be presented (in a fixed basis!) by that of their matrices (in the same 
basis). This is so convenient that the same language is often used to present not only the long bracket, 



Long 
bracket 
as a matrix 
product 



(j3\A\a) = ^j3*A jr a r =[j3*,j3 : 



* * 



A u A l2 
A A 

A 2\ ^22 



\ — J 



(4.54) 



but even the simpler short bracket: 



Short 
bracket 
as a matrix 
product 




(4.55) 



despite the fact that these equalities involve the use of non-square matrices: rows of (complex- 
conjugate!) expansion coefficients for the presentation of bra-vectors, and columns of these coefficients 
for the presentation of ket-vectors. With that, the mapping of states and operators on matrices becomes 
completely general. 

Now let us have a look at the outer product operator (26). Its matrix elements are just 
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(4.56) 



These are elements of a very special square matrix, whose filling requires the knowledge of just 2N 
scalars (where N is the basis set size), rather than N 2 scalars as for an arbitrary operator. However, a 
simple generalization of such outer product may present an arbitrary operator. Indeed, let us insert two 
identity operators (44), with different summation indices, on both sides of any operator: 

/ \ f \ 



A = IAI 



II 

V J 



UjKUj 



A 



V f 



u f (u f \ 



) 



and use the associative axiom to rewrite this expression as 

A=Y\ u j){ u M\ u j)( u f\ 



(4.57) 



(4.58) 



But the expression in the middle long bracket is just the matrix element (47), so that we may write 

(4.59) 




Operator's 
expression 
via its 
matrix 
elements 



The reader has to agree that this formula, which is a natural generalization of Eq. (44), is extremely 
elegant. Also note the following parallel: if we consider the matrix element definition (47) as some sort 
of analog of Eq. (40), then Eq. (59) is a similar analog of the expansion expressed by Eq. (37). 

The matrix presentation is so convenient that it makes sense to move it by one level lower - from 
state vector products to "bare" state vectors resulting from operator's action upon a given state. For 
example, let us use Eq. (59) to present the ket-vector (18) as 

r \ 



Ala 



U u j) A A u r \ \ a ) = U u j) A A u 



or 



(4.60) 



According to Eq. (40), the last short bracket is just ay, so that 



\a' 



jr a r 



J v J 



(4.61) 



But expression in middle parentheses is just the coefficient a) of expansion (37) of the resulting ket- 
vector (60) in the same basis, so that 



a' 



J J A jr a r 



(4.62) 



This result corresponds to the usual rule of multiplication of a matrix by a column, so that we may 
represent any ket-vector by its column matrix, with the operator action looking like 



a 



V ••• J 



A 2l A 22 



^a ^ 



os 



v ••• J 



(4.63) 



Absolutely similarly, the operator action on the bra-vector (21), represented by its row-matrix, is 
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(* * \ / * 
or, ,a' 2 ,...)= \a l , 



a 



2 '"• 





M 






21 





(4.64) 



22 



By the way, Eq. (64) naturally raises the following question: what are the elements of the matrix 
in its right-hand part, or more exactly, what is the relation between the matrix elements of an operator 
and its Hermitian conjugate? The simplest way to get an answer is to use Eq. (25) with two arbitrary 
states (say, Uj and «, ) of the same basis in the role of a and /? Together with the orthonormality relation 
(38), this immediately gives 13 



Hermitian 
conjugate's 
matrix 
elements 



it 



A, 



j j 



(4.65) 



Thus, the matrix of the Hermitian conjugate operator is the complex conjugated and transposed matrix 
of the initial operator. This result exposes very clearly the essence of the Hermitian conjugation. It also 
shows that for the Hermitian operators, defined by Eq. (22), 



A M' = A JT 



(4.66) 



i.e. any pair of their matrix elements, symmetric about the main diagonal, should be complex conjugate 
of each other. As a corollary, the main-diagonal elements have to be real: 



A„ =A, 



i.e. ImAjj = 0. 



(4.67) 



Operator's 
eigenstates 
and 

eigenvalues 



Hermitian 
operator's 
eigenvalues 



(Matrix (50) evidently satisfies Eq. (66), so that the identity operator is Hermitian.) 

In order to fully appreciate the special role played by Hermitian operators in the quantum theory, 
let us introduce the key notions of eigenstates aj (described by their eigenvectors {aj\ and \a } )) and 
eigenvalues (c-numbers) Aj of an operator A , defined by the equation they have to satisfy: 14 









A 


a j) = A j 


aj). 







Let us prove that eigenvalues of any Hermitian operator are real, 15 



A., for y= 1,2,..., TV, 



(4.68) 



(4.69) 



13 For the sake of formula compactness, below I will use the shorthand notation in which the operands of this 
equality are just Ajj- and A*jy. I believe that it leaves little chance for confusion, because the Hermitian 
conjugation sign f may pertain only to an operator (or its matrix), while the complex conjugation sign * to a 
scalar - say a matrix element. 

14 This equation should look familiar to the reader - see the stationary Schrodinger equation (1.60), which was the 
focus of our studies in the first three chapters. We will see soon that that equation is just a particular (coordinate) 
representation of Eq. (66) for the Hamiltonian as the operator of energy. 

15 The reciprocal statement is also true: if all eigenvalues of an operator are real, it is Hermitian (in any basis). 
This statement may be readily proved by applying Eq. (93) below to the case when A kk - = AAk-, with A k * = A k . 
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while the eigenstates corresponding to different eigenvalues are orthogonal: 

(a j | a r ) = 0, if A j * A f 

The proof of both statements is surprisingly simple. Let us inner-multiply both sides of Eq. (68) 
by bra-vector {a,j\. In the right-hand part of the result, the eigenvalue Aj, as a c-number, may be taken out 
of the bra-ket, giving 

(a / |i|a y ) = ^(a / |fl y ). (4.71) 

This equality should hold for any pair of eigenstates, so that we may swap the indices in Eq. (71), and 
complex-conjugate the result: 

(aj\A\a.)* =A* r (a j \a j )* . {A .12) 

Now using Eqs. (14) and (25), together with the Hermitian operator definition (22), we may transform 
Eq. (72) to the following form: 

(a r \A\aj\ = A*(a f \aj). (4.73) 
Subtracting this equation from Eq. (71), we get 




(4.74) 



There are two possibilities to satisfy this equation. If indices j and j ' are equal (denote the same 
eigenstate), then the bra-ket is the state's norm squared, and cannot be equal to zero. Then the left 
parentheses (with j =j') have to be zero, i.e. Eq. (69) is valid. On the other hand, if j and j" correspond to 
different eigenstates, the parentheses cannot equal zero (we have just proved that all Aj are real!), and 
hence the state vectors indexed by j and j ' should be orthogonal, e.g., Eq. (70) is valid. 

As will be discussed below, these properties make Hermitian operators suitable for the 
description of physical observables. 



. . Hermitian 
(4. /(J) operator's 

eiaenvectors 



4.4. Change of basis and matrix diagonalization 

From the discussion of last section, it may look that the matrix language is fully similar to, and in 
many instances more convenient than the general bra-ket formalism. In particular, Eqs. (52), (54), (55) 
show that any part of any bra-ket expression may be directly mapped on the similar matrix expression, 
with the only slight inconvenience of using not only columns, but also rows (with their elements 
complex-conjugated), for state vector presentation. In this context, why do we need the bra-ket language 
at all? The answer is that the elements of the matrices depend on the particular choice of the basis set, 
very much like the Cartesian components of a usual vector depend on the particular choice of reference 
frame orientation (Fig. 4), and very frequently it is convenient to use two or more different basis sets for 
the same system. 

With this motivation, let us study what happens if we change from one basis, {u}, to another 
one, {v} - both full and orthonormal. First of all, let us prove that for each such pair of bases, there 

exists such an operator U that, first, 
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Basis 
transform 



Unitary 
operator's 
definition 



\ V J/ = U \ U j)' 



(4.75) 



and, second, 



UW =U r U = I. 



(4.76) 



Unitary 
operator 
of basis 
transform 



(Due to the last property, 16 U is called a unitary operator, and Eq. (75), a unitary transformation.) 

y, 

y ' 



Fig. 4.4. Transformation of 
components of a 2D vector at 
a reference frame rotation. 



A very simple proof of both statements may be achieved by construction. Indeed, let us take 

(4.77) 




v,Kii, 



Conjugate 
unitary 
transform 
operator 



- an evident generalization of Eq. (44). Then 

^ K ) = Z I v r )( u r \ u j ) = Z I v r ) 8 a = I v j 
so that Eq. (75) has been proved. Now, applying Eq. (31) to each term of sum (77), we get 



^=2K}(v,|, 



so that 



an = z|v,)( M , | M/ )(v ; ., i = z|v 7 K(v,, I = Zh)(v, | • 



(4.78) 



(4.79) 



(4.80) 



j<j j,j j 

But according to the closure relation (44), the last expression is just the identity operator, q.e.d. 17 (The 
proof of the second equality in Eq. (76) is absolutely similar.) 

As a by-product of our proof, we have also got another important expression (79). It implies, in 
particular, that while, according to Eq. (77), operator U performs the transform from the "old" basis «, 
to the "new" basis Vj, its Hermitian adjoint U * performs the reciprocal unitary transform: 



Reciprocal 
basis 
transform 









Uj ). 



(4.81) 



16 An alternative way to express Eq. (76) is to write U —U , but I will try to avoid this language. 

17 Quod erat demonstrandum (Lat.) - what needed to be proved. 
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Now, let us see how do the matrix elements of the unitary transform operators look like. 
Generally, as was stated above, operator's elements depend on the basis we calculate them in, so we 
should be careful - initially. For example, let us calculate the elements in basis {u}: 



U, 





( 


\ 






= (u ■ \U \u ., ) = (u • 

in tt \ J \ J 1 \ J 


I 


l\ V k)( U k\ 


\u r ) = (uj \v f ). 


(4.82) 




V k 


J 







Now performing a similar calculation in basis {v}, we get 



JJ I m v \ J \ j 



= / Vl \U\v 1 ,) = (v 



Zlv. 



V k 



v f ) = [u j \v J , 



(4.83) 



Surprisingly, the result is the same! This is of course true for the Hermitian conjugate of the unitary 
transform operator as well: 



U 



tl _rrt 



= ULL=(vAuA 



JJ I m 11 JJ I mi' \ J \ J 



(4.84) 



These expressions may be used, first of all, to rewrite Eq. (75) in a more direct form. Applying 
the first of Eqs. (41) to state v^-of the "new" basis, we get 



\ v r) = H\ u j){ u j\ v r 




(4.85) 


j 




Basis 



Similarly, the reciprocal transform is 





j 




j 





transforms: 

matrix 

form 



(4.86) 



These equations are very convenient for applications; we will use them already later in this section. 

Next, we may use Eqs. (83), (84) to express the effect of the unitary transform on expansion 
coefficients (37) of vectors of an arbitrary state a. In the "old" basis {«}, they are given by Eq. (40). 
Similarly, in the "new" basis {v}, 



a 



j\mv \ j 



v Act, 



(4.87) 



Again inserting the identity operator in the form of closure (44), with internal index j ', and then using 
Eq. (84), we get 

r \ 



a 



J I in v \ j 



Z I u f ){ u r\ I a ) = Z ( v j \ u r )( u r \ a ) = lL u l { u r I a ) = Z u l a r I m u 



v j 



The reciprocal transform is (of course) performed by matrix elements of operator U : 



a 



.1. =YU.,a,\ i 

j | in u / i jj j | in v 



(4.88) 



(4.89) 



Both structurally and philosophically, these expressions are similar to the transformation of 
components of a usual vector at coordinate frame rotation. For example, in two dimensions (Fig. 4): 
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K a y J 



cosg) smg) 
-sin^ cosg) J 



V a yJ 



(4.90) 



A f 

(In this analogy, the unitary property (76) of operators U and U ] corresponds to the fact that the 
determinant of the rotation matrix in Eq. (90) equals 1 .) Please pay attention here: while the transform 
(75) from the "old" basis {«} to the "new" basis {v} is performed by the unitary operator, the change 
(88) of a state vectors components at this transformation requires its Hermitian conjugate. Actually, this 
is also natural from the point of view of the geometric analog of the unitary transform (Fig. 4): if the 
"new" reference frame {x ', y '} is obtained by a counterclockwise rotation of the "old" frame {x, y} by 
some angle q>, for the observer rotating with the frame, vector a (which is itself unchanged) rotates 
clockwise. Due to the analogy between expressions (88) and (89) on one hand, and our old friend Eq. 
(62) on the other hand, it is tempting to skip indices in our new results by writing 



\a) =U*\a 



\a). 



U\a 



(4.91) 



Since matrix elements of U and U 1 do not depend on basis, such language is not too bad; still, the 
symbolic Eq. (91) should not be confused with genuine (basis-independent) bra-ket equalities. 

Now let us use the same trick of identity operator insertion, repeated twice, to find the 
transformation rule for matrix elements of an arbitrary operator: 



V k 



\ f 
J V ,< 



Zk)W a ZI%)W \ v f) = H u i A kkL» u v 



k,k' 



k'f 



Matrix . 

elements' absolutely similarly, we can get 

transforms 




(4.92) 



(4.93) 



In the spirit of Eq. (91), we may present these results symbolically as well, in a compact bra-ket form: 



A 



A 



. =ua\. t/ 1 ". 

in u in v 



As a sanity check, let us apply this result to the identity operator: 

/L =fc/ t M I =(u^u\ =/i 



(4.94) 



(4.95) 



- as it should be. One more invariant of the basis change is the trace of any operator, defined as the sum 
of the diagonal terms of its matrix in a certain basis: 



Operator/ 
matrix 
trace 



Tr A = Ty A^Y.Aj 



(4.96) 



The (easy) proof of this fact, using the relations we have already discussed, is left for reader's exercise. 

So far, I have implied that both state bases {u} and {v} are known, and the natural question is 
where does this information comes from in quantum mechanics of actual physical systems. To get a 
partial answer to this question, let us return to Eq. (68) that defines eigenstates and eigenvalues of an 



Chapter 4 



Page 17 of 40 



Essential Graduate Physics 



QM: Quantum Mechanics 



operator. Let us assume that the eigenstates aj of a certain operator A form a full and orthonormal set, 
and find the matrix elements of the operator in the basis of these states. For that, it is sufficient to inner- 
multiply both sides of Eq. (68), written for index j', by the bra-vector of an arbitrary state a 7 of the same 
set: 



a , \A\ a , 



a , \ A ,, \a , 



(4.97) 



The left-hand part is just the matrix element Ajy we are looking for, while the right hand part is just 
Aj'Sjj-. As a result, we see that the matrix is diagonal, with the diagonal consisting of eigenvalues: 



(4.98) 



Matrix 
elements in 
eigenstate 
basis 



In particular, in the eigenstate basis (but not necessarily in an arbitrary basis!), Ajj means the same as Aj. 
Thus the most important problem of finding the eigenvalues and eigenstates of an operator is equivalent 
to the diagonalization of its matrix, 18 i.e. finding the basis in which the corresponding operator acquires 
the diagonal form (98); then the diagonal elements are the eigenvalues, and the basis itself is the 
desirable set of eigenstates. 

Let us modify the above calculation by inner-multiplying Eq. (68) by a bra-vector of a different 
say, the one, denoted {«}, in which we know the matrix elements Ajj: The multiplication gives 



basis 



u, \A\a 



u t \Ai\a, 



(4.99) 



In the left-hand part we can (as usual :-) insert the identity operator, between the operator and the ket- 
vector, and then use the closure relation (44), while in the right-hand part, we can move the eigenvalue 
Aj out of the bra-ket, and then insert a summation over a new index, compensating it with the proper 
Kronecker delta symbol: 



(4.100) 



Moving out the sign of summation over k ', and using definition (47) of the matrix elements, we get 

Y j (A w -A j d kk \u k ,\a J ) = Q. (4.101) 



But the set of such equalities, for all N possible values of index k, is just a system of linear, 
homogeneous equations for unknown c-numbers {uk\aj}. But according to Eqs. (82)-(84), these numbers 
are nothing else than the matrix elements Utj of a unitary matrix providing the required transformation 
from the initial basis {u} to the basis {a} that diagonalizes matrix A. The system may be presented in 
the matrix form: 




Operator 
(4.102) diagonali- 



zation 



18 Note that expression "matrix diagonalization" is a common and convenient, but dangerous jargon. (A matrix is 
just a matrix, an ordered set of c-numbers, and cannot be diagonalized.) It is OK to use this jargon if you 
remember clearly what it actually means - see the definition above. 
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and the usual condition of its consistency, 



Characteristic 




A n A A l2 




equation 
for finding 




^21 ^22 ~A — 


= 0, 


eigenvalues 









(4.103) 



plays the role of the characteristic equation of the system. I could drop index j in that equation, because 
it has TV roots for parameter A, that we can number, in arbitrary order, as A/, plugging each of them back 
into system (102), we can use it to find N matrix elements Uy (k = 1,2, ...AO corresponding to this 
particular eigenvalue. However, since equations (103) are homogeneous, they allow finding Ujg only to a 
constant multiplier. In order to ensure their normalization, i.e. the unitary character of matrix U, we may 
use the condition that all eigenvectors are normalized (just as the basis vectors are): 

.2 



a j \a 



j 



a,\Ui 
j « 



\a. ) = 



IK =i. 



(4.104) 



for each j. This normalization completes the diagonalization. 
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Now (at last!) I can give the reader some examples. As a simple but very important case, let us 
diagonalize the operators described (in a certain 2-function basis {u}) by the so-called Pauli matrices 



Pauli 
matrices 



ro o 




^0 -i> 




f\ 0^ 


















v o -1, 



(4.105) 



Though introduced by a physicist, with a specific purpose to describe electron's spin, these matrices 
have a general mathematical significance, because together with the 2x2 identity matrix I, they provide 
a full, linearly-independent 2x2 basis - meaning that an arbitrary 2x2 matrix may be presented as 



V^21 



A 



a 0 l + a x o x + a a + a z o z , 



(4.106) 



22 J 



with a unique set of 4 coefficients a. 

Let us start with diagonalizing matrix a x . For it, the characteristic equation (103) is evidently 

-A 1 
1 -A 



= 0, 



(4.107) 



and has two roots, Aij. = ±l. (Again, the numbering is arbitrary!) The reader may readily check that the 
eigenvalues of matrices a y and a z are similar. However, the eigenvectors of the operators corresponding 
to all these matrices are different. To find them for a v , let us plug its first eigenvalue, A\ = +1, back into 
equations (101), written for this particular case: 



i/j \ a l ) + (u 2 Oj 



0, 



Wj \a l )-(u 2 = 0. 



(4.108) 



19 A possible slight complication here are degenerate cases when characteristic equation gives certain equal 
eigenvalues corresponding to different eigenvectors. In this case the requirement of the mutual orthogonality of 
these states should be additionally enforced. 
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The equations are compatible (of course, because the used eigenvalue A\ = +l satisfies the characteristic 
equation), and any of them gives 



Wj flj ) = (u 2 flj ), i.e. U n =U 2V 



With that, the normalization condition (104) yields 



(4.109) 



(4.110) 



Although the normalization is insensitive to the simultaneous multiplication of U\\ and C/21 by the same 
phase factor exp{z'^} with any real cp, it is convenient to keep the coefficients real, for example taking <p 
= 0, i.e. to get 

1 (4.111) 



U n =U 2l = 



Performing an absolutely similar calculation for the second characteristic value, A% = -1, we get 
Uu - -U22, and we may choose the common phase to get 

1 



so that the whole unitary matrix for diagonalization of the operator corresponding to a x is 20 




(4.112) 



Unitary 
, . , T ~x matrix 
(4.113) diagonalizing 



For what follows, it will be convenient to have this result expressed in the ket-relation form - see Eqs. 
(85)-(86): 



I a 2 ) = U n \u x ) + U 21 \u 2 ) = _ ^(jw 1 }-|t< 2 )), (4.1 14) 

\2 



= U n \u l ) + U 2l \u 2 ) = -j=:(\u l } + \u 2 }l 

IMj) = C/j', I cz t ) + L/Jj J a 2 ) = -^=(|a, ) + | a 2))' I w 2 ) = ^i2| a i) + ^22 1 ^2 } = ~~ !=(| a i) ~ | a 2))' (4.1 15) 

V 2 \2 



These results are already sufficient to understand the Stern-Gerlach experiments described in 
Sec. 1 - with two additional postulates. The first of them is specific for electron's spin, namely that free 
electron's interaction with external magnetic field may be described by the following vector operator of 
the dipole magnetic moment: 



2m 



■S, 



Magnetic 
(4.116) moment 
operator 



where g e = 2 is electron's g-factor, 21 and S is the vector operator of electron spin, 22 that is represented, 
in the so-called z-basis, by the following 3D vector of the Pauli matrices (105): 



20 Note that though this particular unitary matrix is Hermitian, this is not true for an arbitrary choice of phases (p. 

21 Actually, due to quantum electrodynamics effects, electron's g-factor is slightly higher than two: g e « 
2.002319304... « 2(1 + al2n+ ...), where a = e 2 IA7tSohc « 1/137 is the fine structure constant. 
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Spin-Vfe 
matrix 







s = *(, 

2 V 





(4.117) 



and n Xi y rZ are the usual Cartesian unit vectors in 3D space. (In the quantum-mechanics sense, they are just 
c-numbers, or rather "c-vectors".) The z-basis, in which Eq. (177) is valid, is defined as an orthonormal 
basis of two states, frequently denoted T an X, in which the z-component of the vector operator of spin is 
diagonal, with eigenvalues +H/2 and -h/2. Note that we do not "understand" what exactly these states 
are, 23 but loosely associate them with a certain internal rotation of the electron about z-axis, with either 
positive or negative angular momentum component S z . However, any attempt to use such classical 
interpretation for quantitative predictions runs into fundamental difficulties - see Sec. 5.7 below. 

The second new postulate describes the general relation between the bra-ket formalism and 
experiment. 24 Namely, in quantum mechanics, each real observable A is represented by a Hermitian 

operator A = A', and a result of its measurement in a quantum state a, described by a linear 

superposition of the eigenstates aj of the operator, 



a 



a A a 



with a ,. ={a.\a 



(4.118) 



may be only one of corresponding eigenvalues Aj. 25 If state (118) and all eigenstates a, are normalized to 
unity, 

(4.119) 



(4.120) 







(a 


a) = l, (a j |«/) = 1. 


then the probability of outcome Aj is 






Quantum 








measurement 




= a j a j = (a | a } ^a y . . 


potulate 









This relation is evidently a generalization of Eq. (1.22) in wave mechanics. As a sanity check, let 
us assume that the set of eigenstates aj is full, and calculate the sum of all the probabilities: 



a) = (a\l\a) = 1 



(4.121) 



Now returning to the Stern-Gerlach experiment, conceptually the description of the first (z- 
oriented) experiment shown in Fig. 1 is the hardest for us, because the statistical ensemble describing 
the unpolarized electron beam at its input is mixed ("incoherent"), and cannot be described by a pure 
("coherent") superposition of the type (6) that have been the subject of our studies so far. (We will 



22 Eq. (117) is valid for any particle with "spin-!/2" - see Sec. 5.7 below. 

23 If you think about it, word "understand" typically means that we can explain a new, more complex notion in 
terms of those discussed earlier and considered "known". In our example, we cannot express the spin states by 
some wavefunction ^r), or any other mathematical notion discussed earlier. The bra-ket formalism has been 
invented exactly to enable mathematical analysis of such "new" quantum states. 

24 Here again, just like in Sec. 1.2, the statement implies the abstract (mathematical) notion of "ideal 
experiments", postponing the discussion of real (physical) measurements until Sec. 7.7. 

25 As a reminder, in the end of Sec. 3 we have already proved that such eigenstates corresponding to different Aj 
are orthogonal. If any of these values is degenerate, i.e. corresponds to several different eigenstates, they should 
be also selected orthogonal, in order for Eq. (1 18) to be valid. 
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discuss the mixed ensembles in Chapter 7.) However, it is intuitively clear that its results, and in 
particular Eq. (6), are compatible with the description of its two output beams as sets of electrons in pure 
states T and X, respectively. The absorber following that first stage (Fig. 2) just takes all spin-down 
electrons out of the picture, producing an output beam of polarized electrons in a pure T state. For such 
beam, probabilities (120) are W\ = 1 and Wi = 0. This is certainly compatible with the result of the 
"control" experiment shown on the bottom panel of Fig. 2: the repeated SG (z) stage does not split such 
a beam, keeping the probabilities the same. 

Now let us discuss the double Stern-Gerlach experiment shown on the top panel of Fig. 2. For 
that, let us present the z-polarized beam in another basis of two states (I will denote them as — > and <— ) 
in which, by definition, the matrix of operator S x is diagonal. But this is exactly the set we called a 1,2 in 

the a x matrix diagonalization problem solved above. On the other hand, states T and 4- are exactly what 
we called u\j. in that problem, because in this basis, matrices <j z and hence S z are diagonal. Hence, in 
application to the electron spin problem, we may rewrite Eqs. (1 14)-(1 15) as 

Currently, for us the first of Eqs. (123) is most important, because it shows that the quantum 
state of electrons entering the SG (x) stage may be presented as a coherent superposition of electrons 
with S x = +H/2 and S x = -fill. Notice that the beams have equal probability amplitude moduli, so that 
according to Eq. (122), the split beams — > and <— have equal intensities, in accordance with experiment. 
(The minus sign before the second ket-vector is of no consequence here, though it may have an impact 
on outcome of other experiments - for example if the — > and <— beams are brought together again.) 

Now, let us discuss the most mysterious (from the classical point of view) multi-stage SG 
experiment shown on the middle panel of Fig. 2. After the second absorber has taken out all electrons in, 
say, the <— state, the remaining electrons in state — > are passed to the final, SG (z), stage. But according 
to the first of Eqs. (122), this state may be presented as a (coherent) linear superposition of the T and 4- 
states, with equal amplitudes. The stage separates these two states into separate beams, with equal 
probabilities W / [=Wi= 54 to find an electron in each of them, thus explaining the experimental results. 

To conclude our discussion of the multistage Stern-Gerlach experiment, let me note that though 
it cannot be explained in terms of wave mechanics (which operates with scalar de Broglie waves), it has 
an analogy in classical theories of vector fields, such as the classical electrodynamics - see Fig. 3. 



(4.122) Relation 
between 
eigenvectors 
of operators 

(4.123) S^andS, 



> 2 



Fig. 4.3. Light polarization sequence similar to the 3-stage 
Stern-Gerlach experiment shown on the middle panel of Fig. 2. 
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Let a plane electromagnetic wave propagate perpendicular to the plane of drawing, and pass 
through linear polarizer 1. Similarly to the initial SG (z) stages (including the following absorbers) 
shown in Fig. 2, the polarizer produces a wave linearly polarized in one direction - the vertical direction 
in Fig. 3. Its electric field vector has no horizontal component, as may be revealed by wave's full 
absorption in a perpendicular polarizer 3. However, let us pass the wave through polarizer 2 first. In this 
case, the output wave does acquire a horizontal component, as can be, again, revealed by passing it 
through polarizer 3. If angles between polarization direction 1 and 2, and between 2 and 3, are both 
equal 7t/4, each polarizer reduces the wave amplitude by a factor of V2, and hence intensity by a factor 
of 2, exactly like in the multistage SG experiment, with polarizer 2 playing the role of the SG (x) stage. 
The "only" difference is that the necessary angle is nIA, rather than by nil for the Stern-Gerlach 
experiment. In quantum electrodynamics (see Chapter 9 below), which confirms the classical 
predictions for this experiment, this difference is explained by that between the integer spin of the 
electromagnetic field quanta, photons, and the half-integer spin of electrons. 



Expectation 
value 
as a long 
bracket 



4.5. Observables: Expectation values and uncertainties 

After this particular (and hopefully expiring) example, let us discuss the general relation between 
the Dirac formalism and experiment in more detail. The expectation value of an observable over any 
statistical ensemble (not necessarily coherent) may be always calculated using the general rule (1.37). 
For the particular case of a coherent superposition (118), we can combine that definition with Eq. (120) 
and the second of Eqs. (118), and then use Eqs. (59) and (98) to write 

{ a ) = T, A j W j =H a U J a j = T J ( a \ a J } A J ( a J \ a ) = T,( a \ a J )( a j\ A \ a r)( a r\ a )- ( 4 - 124 ) 

j j j j,f 

Now using the completeness relation (44) twice, with indices j and j', we arrive at a very simple and 
important formula 26 



A) = (a\A\a 



(4.125) 



This is a clear analog of the wave-mechanics formula (1.23) - and as we will see in the next chapter, 
may be used to derive it. A huge advantage of Eq. (125) is that it does not explicitly involve the 
eigenvector set of the corresponding operator, and allows the calculation to be performed in any 
convenient basis. 27 

For example, let us consider an arbitrary state a of electron's spin, and calculate the expectation 
values of spin components. The calculations are easiest in the z-basis, because we know the operators of 
the components in that basis - see Eq. (117). Representing the ket- and bra-vectors of our state as linear 
superpositions of vectors of the basis states T and X, 

\a) = aJty + aAV}, (a\ = (f\a^ +(-l\a±. (4.126) 



26 This equality reveals the full beauty of Dirac's notation. Indeed, initially the quantum-mechanical brackets just 
reminded the angular brackets used for statistical averaging. Now we see that in this particular (but most 
important) case, the angular brackets of these two types may be indeed equal to each other! 

27 Note that Eq. (120) may be rewritten in the form similar to Eq. (125): W. = (a\A.j\a), where A y . = I J 
is the operator (42) of projection upon the j eigenstate a,. 
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and plugging these expressions to Eq. (125) written for observable S z , we get 

(s z )=f(t|«;+(i|«;X(a t |t)+^|i)) 



: (T|£|T) 



(4.127) 



Now there are two equivalent ways (both very simple :-) to calculate the bra-kets in this 
expression. The first one is to represent each of them in the matrix form in the z-basis, in which bra- and 
ket-vectors of states T and \ are, respectively, matrix-rows (1, 0) and (0, 1), or the similar matrix- 
columns. Another (perhaps more elegant) way is to use the general Eq. (59), for the z-basis, to write 



(4.128) component 
operators 



For our particular calculation, we may plug the last of these expressions into Eq. (127), and to use the 
orthonormality conditions (119): 



Both calculations give (of course) the same result: 

h 

2 



S _ ) = ~~ I CC^CC^ Ct ^cc ^ 



(4.129) 



(4.130) 



This particular result might be also obtained using Eq. (120) for probabilities Wt = a^a\* and Wi 



aiai*: 



S.) = W, 



+ w 



— CC^CC^ 



(4.131) 



The formal way (127), based on using Eq. (125), has, however, an advantage of being applicable, 
without any change, to finding the observables whose operators are not diagonal in the z-basis, as well. 
In particular, absolutely similar calculations give 

{ i3 ^ ) — Ct >j- Ct -j- ct ^ ct ^ T" Ct ^ CC^ a^a* +a ]r a* J, (4.132) 

(S y ^ = a^a* (t\S y t^ + a^a^lX S y Vj + a^a*!^ S y ^ + a±a*(t S y \Vj = i ^a t a* -a^a*\ (4.133) 

Similarly, we can express, via the same coefficients at and ai, the r.m.s. fluctuations of all spin 
components. For example, let us have a good look at the spin state T. According to Eq. (126), in this 
state at = 1 and ai = 0, so that Eqs. (130)-(133) yield: 

S,) = T> (S x ) = (S v ) = 0. (4.134) 



Now let us use the same Eq. (125) to calculate the spin component uncertainties. According to Eqs. 

(105) and (1 17), operators of spin component squared are equal to {fill) I , so that the general Eq. (1.33) 
yields 
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(ssJ={s 2 z )-{s z ) 2 ={[\s^}-[^) = 



(ss x r=(s>)-(s x y=n 



K) 2 




(4.135a) 



(4.135b) 



(4.135c) 



While Eqs. (134) and (135a) are compatible with the classical notion of the spin being 
"definitely in the T state", this correspondence should not be overstretched to the interpretation of this 
state as a certain (z) orientation of electron's magnetic moment m, because such classical picture cannot 
explain Eqs. (135b) and (135c). The best (but still imprecise!) classical image I can offer is the magnetic 
moment m oriented, on the average, in the z-direction, but still having x- and j-components strongly 
"wobbling" about their zero average values. 

It is straightforward to verify that in the x-polarized and j-polarized states the situation is similar, 
with the corresponding change of indices. Thus, in neither state may all 3 components of the spin have 
exact values. Let me show that this is not just an occasional fact, but reflects the most profound property 
of quantum mechanics, the uncertainty relations. Consider 2 observables, A and B, that may be 
measured in the same quantum state. There are two possibilities here. If operators corresponding to the 
observables commute, 



0. 



(4.136) 



then all the matrix elements of the commutator in any orthogonal basis (in particular, in the basis of 
eigenstates aj of operator A) are also zero. From here, we get 



a , 



A,B 



\a r } = (a j \AB\a J ,)-(a J \BA\a r } = 0 



(4.137) 



In the first bra-ket of the middle expression, let us act by operator A on the bra-vector, while in the 
second one, on the ket-vector. According to Eq. (68), such action turns operators into the corresponding 
eigenvalues, so that we get 



A J (a J \B\a r }-A f (a J \B\a r } = 



r 



Aj-Aj. 
v J 



a j \B\a J ,) = 0. 



(4.138) 



This means that if eigenstates of operator A are non-degenerate (i.e. Aj Aj - if j ^ y"), the matrix 

of operator B has to be diagonal in basis a,, i.e., the eigenstate sets of operators A and B coincide. 
Such pairs of observables, that share their eigenstates, are called compatible. For example, in wave 
mechanics of a particle, momentum (1.26) and the kinetic energy (1.27) are compatible, sharing 
eigenfunctions (1.29). Now we see that this is not occasional, because each Cartesian component of the 
kinetic energy is proportional to the square of the corresponding component of the momentum, and any 
operator commutes with an arbitrary power of itself: 



A,A"] = 



A, AA...A 



= AAA...A-AA...AA = 0. 



(4.139) 
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Now, what if operators A and B do not commute? Then the following general uncertainty 
relation is valid: 28 



SASB>- 
2 



A,B 



General 



(4 140) uncertainty 
relation 



The proof of Eq. (140) may be divided into two steps, the first of which proves the so-called Schwartz 
inequality: 29 



a\a)(j3\j3)>\{a\j3)\ 



(4 141) Schwartz 
v ' ' inequality 



The proof may be started by using postulate (16) - that the norm of any legitimate state of the system 
cannot be negative. Let us apply this postulate to the state with the following ket-vector: 



8) = \a 



\p\ 



(4.142) 



where a and /3 are possible, non-null states of the system, so that the denominator in Eq. (142) is not 
equal to zero. For this case, Eq. (16) gives 



a\ 



a\0 

Wp 



p\ 



\a 



J3\a 



\P 



>0. 



Opening the parentheses, we get 

(a\P) 
P\P 



a\a 



f3\a 



a 



P P 



a\j3) + 



P\P 
a\j3)(j3\a 



(4.143) 



(P\P 



(/?!/?)> o. 



(4.144) 



After the cancellation of one inner product (/3 \P) in the nominator and denominator of the last term, it 
cancels with the 2 rd (or 3 rd ) term, proving the Schwartz inequality (141). 



Now let us apply this inequality to states 

and \p\ = B\y 



(4.145) 



where, in both relations, y is the same (but otherwise arbitrary) possible state of the system, and the 
deviations operators are defined similarly to observable deviations (see Sec. 1.2), for example, 

A = A-(A 



(4.146) 

With this substitution, and taking into account that the observable operators A and B are Hermitian, 
Eq. (141) yields 



(y\A 2 \ r }(y\B 2 \y}> 



y\AB\y 



(4.147) 



28 Note that both sides of Eq. (140) are state-specific; the uncertainty relation statement is that this inequality 
should be valid for any possible quantum state of the system. 

29 This inequality is the quantum-mechanical analog of the usual vector algebra result o?f} > |a-p| 2 . 
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Since state y is arbitrary, we may use Eq. (125) to rewrite this relation as an operator inequality: 



SA5B> 



ab\ 



(4.148) 



Actually, this is already an uncertainty relation, even "better" (stronger) than its standard form 
(140); moreover, it is more convenient in some cases. In order to proceed to Eq. (140), we need a couple 
more steps. First, let us notice that the operator product in Eq. (148) may be recast as 



AB=^A,B 



— C, where C = i 
2 



A,B 



(4.149) 



Any anticommutator of Hermitian operators, including that in Eq. (149), is a Hermitian operator, and its 
eigenvalues are purely real, so that its expectation value (in any state) is also purely real. On the other 
hand, the commutator part of Eq. (149) is just 



C = i 



A,B 



i (A - (a)\b - (B))- i(b - (b)\a - (A)) = i (AB -BAj=i[A f B\. (4.1 50) 



Second, according to Eqs. (52) and (65), the Hermitian conjugate of any product of Hermitian operators 
A and B is just the product of swapped operators. Using the fact, we may write 

C 1 " =(/[i,i]) t =-i(AB)^ +i(BA)^ = -iBA + iAB = i[a,b]= C, (4.151) 

so that operator C is also Hermitian, i.e. its eigenvalues are also real, and thus its average is purely real 
as well. As a result, the square of the average of the operator product (149) may be presented as 



1 



(4.152) 



Since the first term in the right-hand part of this equality cannot be negative, 



A,B 



and we can continue Eq. (148) as 







i 








8ASB> 




> — 

2 


( 


A,B 


) 



(4.153) 



(4.154) 



thus proving Eq. (140). 

For the particular case of operators x and p x (or a similar pair of operators for another Cartesian 

coordinate), we can readily combine Eq. (140) with Eq. (2.14b) and to prove the original Heisenberg's 
uncertainty relation (2.13). For the spin-1/2 operators defined by Eq. (117), it is straightforward (and 
Commutation highly recommended to the reader) to show that 



relation 
for spin-1/2 
component 
operators 



s.,s. 



- ifiS, 



(4.155) 



with similar relations for other pairs of indices taken in the "correct" order (from x to y to z to x, etc.). 
As a result, the uncertainty relations (140) for spin-1/2 particles, notably including electrons, are 
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ft 

as x ss v > - 

> 2 



etc. 



(4.156) 



Uncertainty 
relations 
for spin-1/2 
components 



In particular, in the T state, the right-hand part of this relation equals {till) 2 , and neither of the 
uncertainties 8S X , SS y can equal zero. As a reminder, our direct calculation earlier in this section has 
shown that each of these uncertainties is equal to ft/2, i.e. their product equals to the lowest value 
allowed by the uncertainty relation (156). In this aspect, the spin-polarized states are similar to the 
Gaussian wave packets studied in Sec. 2.2. 



4.6. Quantum dynamics: Three pictures 

So far in this chapter, I shied away from the discussion of system dynamics, implying that the 
bra- and ket-vectors of the system are their "snapshots" at a certain instant t. Now we are sufficiently 
prepared to examine their time dependence. One of the most beautiful features of quantum mechanics is 
that the time evolution may be described using either of three alternative "pictures", giving exactly the 
same final results for expectation values of all observables. 

From the standpoint of our wave mechanics experience, the Schrddinger picture is the most 
natural. In this picture, the operators corresponding to time-independent observables (e.g., to the 
Hamiltonian function H of an isolated system) are also constant, while the bra- and ket-vectors of the 
quantum state of the system evolve in time as 



(a(t) 


= (a(t 0 ) 


u } (t,t Q ), 


a(t)) = u(t,t 0 ) 


«(*o))»> 



where u(t,t 0 ) is the time-evolution operator, which obeys the following differential equation: 



itiu = Hu, 



(4.157) 



(4.158) 



Schrddinger 
equation of 
operator 
evolution 



where H is the Hamiltonian operator of the system (that is always Hermitian, H 1 = H ), and the dot 
means the differentiation is over argument t, but not to. While this equation is a very natural replacement 
of the wave-mechanical equation (1.25), and is also frequently called the Schrddinger equation, 30 it still 
should be considered as a new, more general postulate, which finds its final justification (as it is usual in 
physics) in the agreement between its corollaries with experiment - more exactly, in having not a single 
credible contradiction with experiment. 

Starting the discussion of Eqs. (157)-(158), let us first consider the case of a system described by 
a time-independent Hamiltonian, whose eigenstates a„ and eigenvalues E„ obey Eq. (68), 31 

H\a„) = EJc 



(4.159) 

and hence are also time-independent. (Similarly to the wavefunctions y/ n defined by Eq. (1.60), a„ are 
called the stationary states of the system.) Let us use Eqs. (157)-(159) to calculate the law of time 
evolution of the expansion coefficients a„, defined by Eq. (118), in the stationary state basis: 



30 Moreover, we will be able to derive Eq. (1.25) from Eq. (154) - see Sec. 5.2. 

31 Here I intentionally use index n rather than j to emphasize the special role played by the special role of the 
stationary eigenstates a„ in quantum dynamics. 
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^"(0 = 4~( a " \ a ^) = 4~( a " |"(*>*o)| »(*(>)) = («« |w(Mo)|«Oo)) 

I , E E i (4 - 160) 

= ( a „ \ — Hu(t,t 0 )\a(t 0 )) = ~^r(a n \u(t,t 0 )\a(t 0 )} = -f-(a„ \a(t)) = ~-E n a n . 
in in in n 

This is the same simple equation as Eq. (1.59), and its integration yields a similar result - cf. Eq. (1.61), 
just with the initial time t 0 rather than 0: 



Time 
evolution 
of stationary 
states 




(4.161) 



In order to illustrate how does this result work, let us consider electron spin's dynamics in a 
time-independent, uniform external magnetic field 3, taking its direction for axis z. To construct the 
system's Hamiltonian, we may apply the correspondence principle to the classical expression for the 
energy of a magnetic moment m in the external magnetic field 



32 



u 



m 3 . 



(4.162) 



In the quantum case, we should describe the magnetic moment m with the operator described by Eq. 
(116), so that (neglecting the small difference between g e and 2) the spin- field interaction Hamiltonian is 



e « e3 « 
H = -m-3 = —S-3 = —S z , 



(4.163) 



Electron in 
magnetic 
field: 
Hamiltonian 
and its 
matrix 



where S z is the operator of the z-component of electron's spin. According to Eq. (117), in the z-basis of 
states T and X, the matrix of operator (163) is 



he3 

H = o\ 

2m „ 



HQ. 



o\, with £2 



e3 
m„ 



(4.164) 



The constant Q so defined coincides with the classical frequency of the precession of a symmetric top, 
with an angular momentum S and magnetic moment m = -(e/m e )S, about axis z, induced by external 
torque x = mx3: 33 



S S m„ 



(4.165) 



In order to apply the general Eq. (161), at this stage we would need to find the eigenstates a n and 
eigenenergies E n of our Hamiltonian. However, with our (smart :-) choice of the direction of axis z, the 
Hamiltonian matrix is already diagonal: 



m 

H = G 

2 ' 



m 



0 -1 



(4.166) 



meaning that T and -l are the eigenstates of the system, with eigenenergies, respectively, 



32 See, e.g., EM Eq. (5.100). As a reminder, we have already used this expression for the derivation of Eq. (3). 

33 See, e.g., CM Sec. 6.5, in particular Eq. (6.72), and EM Sec. 5.5, in particular Eq. (5.114) and its discussion. 
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_, m m 

E*.=-\ and £, = 



(Note that their difference 



AE = E^-E l 



He3 



(4.167) 



(4.168) 



Electron in 

magnetic 

field: 

eigen- 

energies 



m 



corresponds to the classical energy 2m 3 of flipping the magnetic dipole with moment m = -(e/m e )h/2, 
oriented parallel to field 3.) With that, Eq. (161) immediately yields following expressions for the time 
evolution of the expansion coefficients: 



a t (0 = a t (t 0 ) exp I --Q(*-* 0 )L a i (t) = a i (t 0 ) exp I + —Q,(t-t 0 )\ 



(4.169) 



allowing a ready calculation of time evolution of the expectation values of any observable. 

In particular, we can calculate the expectation value of S z as a function of time by applying Eq. 
(130) to an arbitrary time moment t: 



<s»=f 


a t (0«*(0"«i(0«I(0 


_h 

~ 2 


a t (0)a*(0)-a ; (0)a*(0) 


= <S,)(0). 



(4.170) 



Electron's 
spin 

evolution: 
z-component 



Thus the expectation value of the spin component parallel to the applied magnetic field remains 
constant, regardless of the initial state of the system. However, this is not true for the components 
perpendicular to the field. For example, Eq. (132), applied to moment t, gives 



s x )(t) 



h 



(4.171) 



Clearly, this expression describes sinusoidal oscillations with frequency (165). The amplitude 
and phase of these oscillations depend on initial conditions. For example, if at moment to the spin's state 
was T, i.e. at(0) = 1, ai(0) = 0, then both coefficient products in Eq. (171) are equal to zero, i.e. the 
oscillation amplitude vanishes. However, if electron's spin was initially in state — >, i.e. had the 
maximum value of component S x (in classics, we would say "was oriented in direction x"), then 
according to the first of Eqs. (122), 



so that Eqs. (171) yields 34 



while an absolutely similar calculation using Eq. (133) gives 




(4.172) 



(4.173) 



34 This is one more (hopefully, redundant :-) illustration of the difference between averaging over the statistical 
ensemble and over time: in Eqs. (170), (173)-(174), and quite a few relations below, only the former averaging 
has been performed, so the results are still functions of time. 
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Electron's 
spin 
evolution: 
x- and y- 
components 




(4.174) 



These formulas may be interpreted as the torque-induced precession of the Cartesian components of the 
spin vector of length S = h/2, confined in plane [x, y], with classical frequency (164) about axis z. Note, 
however, that this classical language does not describe large quantum-mechanical uncertainties of these 
observables, which are absent in the classical picture of the precession - at least when it starts from a 
definite orientation of the angular momentum vector. 

Now let us return to the discussion of the general Schrodinger equation (158) and prove the 
following fascinating fact: it is possible to write the general solution of this operator equation. In the 
easiest case when the Hamiltonian is time-independent, this solution is an exact analog of Eq. (161), 



u(t,t 0 ) = u(t 0 ,t 0 )exp\-^H(t-t 0 )\ = iQxp\-^H(t-t 0 )\. 



h 



h 



(4.175) 



To start its proof we should, first of all, understand what does a function (in this case, the exponent) of 
an operator mean. In the operator (and matrix) algebra, such functions are defined by their Taylor 
expansions; in particular, Eq. (175) means that 



<(t,t 0 )= /+jr 



1 



= 7 + 



k= ik\ 



l -Ht 
h 



1! 



V 



h 



H(t-t 0 ) + 



2! 



h 



H 2 (t-t 0 ) 2 + 



if o 



(4.176) 



3! 



h 



H 3 (t-t 0 y+. 



where H = HH, H = HHH, etc. Working with such infinite series of operator products is not as hard 
as one could imagine, due to their regular structure. For example, let us differentiate Eq. (176) over t: 



u(t,t 0 ) = 0 + 



1! 



H + 



2! 



H 2 2(t-t 0 ) + 



H 3 3(t-t 0 ) 2 +... 



h 



H 



1 + 



1! 



h 



H(t-t 0 )- 



2! 



h 



H 2 (t-t 0 y 



(4.177) 



+ ... = ■ 



n 



so that the differential equation (158) is indeed satisfied. On the other hand, Eq. (175) also satisfies the 
initial condition 



u(t 0 ,t 0 ) = u*(t 0 ,t 0 ) = I 



(4.178) 



which immediately follows from the definition (157) of the evolution operator, so it is indeed the 
(unique) solution for the time evolution operator - in the Schrodinger picture. 

Now let us allow operator H to be a function of time, but with the condition that its "values" (in 
fact, operators) at different instants commute with each other: 



H(t'),H{t") =0, for any t', t" . 



(4.179) 



(An important example of such a Hamiltonian is that of a particle under the effect of a classical, time- 
dependent force ¥(t): 
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H p =-F(t)-T. 



(4.180) 



Indeed, the radius-vector operator f does not depend explicitly on time and hence commutes with itself, 
as well as with c-numbers F(/') and F(7").) In this case it is sufficient to replace, in all above formulas, 
product H(t -t 0 ) with the corresponding integral over time; in particular, Eq. (175) is generalized as 




Evolution 
(4.181) operator: 
explicit 
expression 



This replacement means that the first form of Eq. (176) should be replaced with 



aM=t+ tUri 



\H(t')dt' =i + Jj--\ \dt x \dt 1 ..\dt k H{t x )H{t 2 )...H{t k ).{4.\%2) 

V'o J /c=1 ^ ^ h t 0 to 



The proof that the first form of Eq. (182) satisfies Eq. (158) is absolutely similar to the one carried out 
above. 

We may now use Eq. (181) to show that the time-evolution operator is unitary at any moment, 
even for the time-dependent Hamiltonian. Indeed, from that formula, 



(t,t 0 )u \t,t 0 ) =Iex^--jH(t')dt' |/expj + ^H(t")dt" 



(4.183) 



Since each of the exponents may be presented with the Taylor series (182), and, thanks to Eq. (179), 
different components of these sums may be swapped at will, expression (183) may be manipulated 
exactly as the product of c-number exponents, in particular rewritten it as 



t 

u(t,t 0 )u ' (t,t 0 ) = I exp 



i t 

\H{t')dt'-\H{t")dt" 



= /exp{0} = 7. 



(4.184) 



This property ensures, in particular, that the system state's normalization does not depend on time: 



a(t)\a(t)) = (a(t 0 )\u \t,t 0 )u(t,t 0 )\a(t 0 )) = (a(t 0 )\a(t 0 ) 



(4.185) 



The most difficult cases for the explicit solution of Eq. (158) are those when Eq. (179) is 
violated. 35 It may be proven that in these cases the integral limits in the last form of Eq. (182) should be 
truncated, giving the so-called Dyson series 



■ \ k i h 



i (t, t 0 ) = / + £ I - - J J dt x \ dt 2 .. . J dt.Hit, )H(t 2 ).. .H(t k ). 



(4.186) 



k=i 



Since we would not have time to use this relation in our course, I will skip its proof. 36 



35 We will run into such situations in Chapter 7, but will not need to apply Eq. (186). 

36 It may be found, for example, in Chapter 5 of textbook by J. Sakurai, Modern Quantum Mechanics, Addison- 
Wesley, 1994. 
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Let me now return to the general discussion of quantum dynamics to outline its alternative, 
Heisenberg picture. For that, let us recall that according to Eq. (125), in quantum mechanics the 
expectation value of any observable A is a long bra-ket. Below we will see that other quantities (say, the 
rates of quantum transitions between pairs of different states, say a and p) may also be measured in 
experiment; the most general form for all such measurable quantities is the following long bracket: 



a\A\0). 



(4.187) 



As has been discussed above, in the Schrodinger picture the bra- and ket-vectors of the states are time- 
dependent, while the variable operators stay constant (if the corresponding variables do not explicitly 
depend on time), so that Eq. (187), applied to moment t, may be presented as 



a(t)\A s \j3(t)} 



(4.188) 



where index "S" emphasizes the Schrodinger picture. Let us apply to the bra- and ket-vectors in this 
expression the evolution law (157): 



a \A\ p) = (a(t 0 ) \u T (t, t 0 )A s u(t, t 0 )| P(t 0 ) 



(4.189) 



This equality means that if we form a long bracket with bra- and ket-vectors of the initial-time states, 
together with the following time-dependent Heisenberg operator 31 



Heisenberg 
operator 



Long bracket 
in the 
Heisenberg 
picture 



A H (t) = u\t,t 0 )A s u(t,t 0 ) = u\t, t 0 )A H (t 0 )u (t, t 0 ) . 



all experimentally measurable results will remain the same as in the Schrodinger picture: 



a\A\p) = (a(t 0 )\A H (t,t 0 )\p(t 0 ) 



(4.190) 



(4.191) 



Let us see how does the Heisenberg picture work for the same simple (but very important!) 
problem of the electron spin precession in a z-oriented magnetic field, described (in the z-basis) by the 
Hamiltonian matrix (166). In that basis, Eq. (158) for the time-evolution operator reads 

0 V- 



V M 21 



l 22j 



o 



i 



V W 21 



l 22) 



m 



(4.192) 



l 22 J 



We see that in this simple case the equations for different matrix elements of the evolution operator 
matrix are decoupled, and readily solvable, using the universal initial condition (160): 38 



u(f,0) = 



iQi 1 2 



0 

jnt/2 



= I cos 



fit 



10, sin- 



Qt 



(4.193) 



J 



37 Note this relation is similar in structure to the symbolic Eqs. (94). 

38 We could of course use this equation result, together with Eq. (157), to obtain all the above results for this 
system within the Schrodinger picture. In our simple case, the use of Eqs. (161) for this purpose was more 
straightforward, but in some cases (e.g., for time-dependent Hamiltonians) an explicit calculation of the time- 
evolution matrix may be the only practicable way to proceed. 



Chapter 4 



Page 33 of 40 



Essential Graduate Physics 



QM: Quantum Mechanics 



Now we can use Eq. (190) to find the Heisenberg-picture operators of spin components. Dropping index 
"H" for brevity (the Heisenberg-picture operators are clearly marked by their dependence on time 
anyway), we get 

S x (0 = u t (^0)S x (0)u(^,0) = ^(t,0)c x n(t,0) 



JOt 1 2 



0 



0 

JOt 1 2 



J 



^0 O 



(4.194a) 



J 



0 

-iOt 



iOt 
0 



= -[a x cosClt-c y sinQr]= S x (0)cosQr-S y (0)sinQf 



An absolutely similar calculation of the other spin components yields 



s v (0 = 



n 



o 

ie-** 



■w 
0 



JOt 



= -[a y cosClt + c x sinQ^] = S^(0)cosQ? + S x (0)sinQ^, (4.194b) 



s.(0 = f 



1 0 
0 -1 



= ^o z =S z (0) 



(4.194c) 



A practical advantage of these formulas is that they describe system's evolution for arbitrary 
initial conditions, thus making the analysis of initial state effects very simple. For example, if the initial 
state was spin-up (T), then Eqs. (194), plugged into Eqs. (191) for the spin component operators, 
immediately yield 



(5,> = (t|5 I (0|t) = (l 0)S,(0P1 = £(1 0) 



(I s ] 

cosQ? (l 0)a x -sinQ? (l 0)a 



and, acting absolutely similarly, we get 



(s y ) = o, <s,)4 



= 0, (4.195a) 



(4.195b) 



On the other hand, if the initial state was spin-right (— >), then with a help from the first of Eqs. (122) we 
get 



n 



cos fit, 



and, acting similarly, 



(S y ) = -smClt, (S z } = 0, 



(4.196a) 



(4.196b) 



i.e. the same result as given by Eqs. (173) and (174). 

Even without the calculation of averages, sometimes just the very expression for operator/matrix 
evolution tells it all. For example, Eq. (194c) clearly shows that regardless of the initial state of the 
system, the component of spin directed along the magnetic field does not change in time. Also, notice 
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that the terms proportional to o x , in the square brackets in Eqs. (194a,b) parallel their averages (196) 
calculated for the initial state —>, but the former expressions also have second terms, proportional to <3 y , 
that reflect the possible contributions of the initial j-component of the spin - if it is present in the initial 
state. Moreover, the last forms of Eqs. (194) for Heisenberg-picture operators formally coincide with the 
classical equations of the torque-induced precession for c-number variables. (In the next chapter, we will 
see that the same exact mapping is valid for the Heisenberg picture of the orbital motion.) 

In order to see that the last fact is by no means a coincidence, let us combine Eqs. (158) and 
(190) to form an explicit differential equation of the Heisenberg operator evolution, for the simplest case 

when operator A s does not depend explicitly on time. For that, let us differentiate Eq. (190) over time: 



Heisenberg 
equation 
of motion 



A,_ 



u* A s u + A & u. 



(4.197) 



Plugging in the derivatives of the time evolution operator from Eq. (158) and its Hermitian conjugate, 
we get 



in A, 



= —u^HA Q u + u^A Q Hu. 



(4.198) 



If the Schrodinger-picture Hamiltonian does not depend on time explicitly (and even if it does, but 
condition similar to Eq. (179) is satisfied), then, according to Eqs. (177) or (182), it commutes with the 
time evolution operator and its Hermitian conjugate, and may be swapped with any of them. 39 Hence, 
we may rewrite Eq. (198) as 



ihA B = 



-Hu A s u + u ] A s uH 



t 



A s ii,H 



A H> H 



(4.199) 



This is the so-called Heisenberg equation of motion.^ 0 



Let us see how does this equation look for the same problem of spin-!/2 precession in a z-oriented 
magnetic field. In the z-basis, Eq. (199) for the vector operator of spin reads 



ih 



'22/ 



Ml 



V^21 



»J 2 



'22 J 



0> 



0 

V^21 



S ^ 
0 



(4.200) 



where I have used the particular form (166) of Hamiltonian operator's matrix. Once again, the equations 
for different matrix elements are decoupled, so that presenting Eq. (117) for the "initial" (the 
Schrodinger-picture) operator as 



S(0) = -[n x a x 



+n , a ,+ 



n 



■in. 



v 



n . + in. 



y 



2 y y z ^ 2 

we can immediately write the solution of the differential equation (200) in either of two forms: 



(4.201) 



39 Due to the same reason, H H = u H s u = u uH s = H s , so that the index of the Hamiltonian operator may be 
dropped in the time evolution equation (199). 

40 Reportedly, this equation was derived by P. A. M. Dirac who was so generous that he himself gave another 
person's name to this key result. (Just the opposite of what frequently happens in physics nowadays :-) 
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s(*) = - 



V 



(n x +in y )e 

r 

0 e 



n 



iQt 
iQt 
0 



(n v -in y )e 
-n 



iQt 



\ i 
+ n 



J 



J 



v 



le 



0 

iQt 



■ie 
0 



jQt 





n 0^ 


+ n. 




J 


v0 ~h 



(4.202) 



The simplicity of the first of these expressions is spectacular. (Remember, it covers any initial 
conditions, and all 3 spatial components of spin!) On the other hand, for some purposes the last form is 
more convenient; in particular, its Cartesian components immediately give the first forms of our earlier 
results (194). 

One of advantages is that the Heisenberg picture is that it provides a more clear link between the 
classical and quantum mechanics. Indeed, analytical classical mechanics may be used to derive the 
following equation of time evolution of an arbitrary function A(qj, pj) of generalized coordinates and 
momenta of the system: 41 

(4.203) 



A = -{A,H\, 

where H is the Hamiltonian function of the system, and {..,..} is the so-called Poisson bracket defined, 
for two arbitrary functions A(t, qj, pj) and B(t, qj, pj), as 



A,B\ 



z 



dA dB dA dB 



dp j dq j dq. dp 



(4.204) 



Poisson 
bracket 



Comparing Eq. (203) with Eq. (199), we see that the correspondence between the classical and quantum 
mechanics (in the Heisenberg picture) is provided by the following symbolic relation 42 



(4.205) 




Classical 
vs. 

quantum 
mechanics 



This relation may be used, in particular, for finding appropriate operators for system's observables, if 
their form is not immediately evident from the correspondence principle. We will develop this 
argumentation further in the next chapter where we revisit the wave mechanics, and also in Chapter 9. 

Finally, let us discuss one more alternative picture of quantum dynamics. It is also attributed to 
P. A. M. Dirac, and is called either the "Dirac picture", or (more frequently) the interaction picture. The 



41 See, e.g., CM Eq. (10.17). Also, please excuse my use, for the Poisson bracket, the same (traditional) symbol 
{...,...} as for the anticommutator. We will not run into the Poisson brackets again in the course, leaving very 
little chance for confusion. 

42 Since we have run into the commutator of Heisenberg-picture operators, let me note emphasize again that the 
"values" of such an operator at different moments of time often do not commute. Perhaps the simplest example is 

the operator x of coordinate of a free ID particle, with Hamiltonian// = p 1 1 2m . Indeed, in this case Eq. (199) 

yields equations ink = \x,H 1= itipl m , and ihp = p,h\=0, with simple solutions (similar to those for 

classical motion of the corresponding observables): p{t) = const = p(0), x{t) = x{0) + p{0)t/ m , so that 

[x(0\ x(t)] = [jc(o), p(o)]t I m = [x s , p s ]t I m = iht I m * 0, if t* 0 . 



Chapter 4 



Page 36 of 40 



Essential Graduate Physics 



QM: Quantum Mechanics 



last name stems from the fact that this picture is very useful for the perturbative (approximate) approach 
to systems whose Hamiltonians may be partitioned into two parts, 



H = H 0 + H mi , 



(4.206) 



where H 0 is the sum of relatively simple Hamiltonians of non-interacting component sub-systems, 
while their second term in Eq. (206) represents their weak interaction. (Note, however, that the relations 
in the balance of this section are exact and not based on these assumptions.) In this case, it is natural to 
consider, together with the genuine unitary operator u{t,t 0 )of the time evolution of the system, which 

obeys Eq. (158), a similarly defined unitary operator of evolution of the "unperturbed system" described 
by Hamiltonian H 0 alone: 



ihu. 



H 0 u 0 , 



and also the following interaction evolution operator, 



The sense of this definition becomes more clear if we insert the reciprocal relation, 



1A — U — XX q 14 j ^ 



(4.207) 
(4.208) 
(4.209) 



(4.210) 



and its Hermitian conjugate, 

U 1 = \U 0 Uj I = U\ Uq , 

into the basic Eq. (190) - which is valid in any picture: 

{a \A\ /?) = (a(t 0 ) \u T (t, t 0 )A s u (t, t 0 )| 0(t o )> = (a(t 0 ) \u] (t, t 0 ]u} (t, t 0 )A s u 0 (t, t 0 % (t, t 0 )| 0(t o )) . (4.211) 



This relation shows that all calculations of the observable expectation values and transition rates 
(i.e. all the results of quantum mechanics that may be experimentally verified) are expressed by the 
following formula, with the standard bra-ket structure (187), 



a\A\fi) = {a I (t)\A I (t)\Mt) 



(4.212) 



if we assume that both the state vectors and operators evolve in time, with the vectors evolving due to 
the interaction operator u , , 

Interaction r 1 ' 

picture: I 7 

£252£ s < a( ' o) l \Mt))=&i(tMMo))> ( 4 - 213 ) 

and operator 



while the operators' evolution being governed by the unperturbed operator u 0 : 



^i(t)=ul{t,t 0 )A s u 0 (t,t 0 ). 



(4.214) 



These relations describe the interaction picture of quantum dynamics. Let me defer an example 
of its convenience until the perturbative analysis of open quantum systems in Sec. 7.6, and here end the 
discussion with a proof that the interaction evolution operator satisfies the Schrodinger equation, 
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itiUj = H jiij, (4.215) 
in which//, is the interaction Hamiltonian transformed in accordance with rule (214): 

H^^uUtA^^A)- (4-216) 

The proof is very straightforward: first using definition (208), and then Eqs. (158) and the Hermitian 
conjugate of Eq. (207), we may write 

itiUj = ih — (wjwj= ihiuju + u\uj = -H 0 u}u + ulHu = -H 0 u}u + uI(h o + H int ^u 

dt (4.217) 

= -H 0 ulu + uIh o u + u\H mi u = (— H 0 u J + u}h o Ju + u}H int u . 

Since « jmay be presented as an integral of H 0 (similar to Eq. (181) relating u and H), these operators 

commute, so that the parentheses in the last form of Eq. (217) vanish. Now plugging u from Eq. (209), 
we get the equation, 

iMj = u}H. mt u 0 Uj = {ulH^u^Uj , (4.218) 

that is equivalent to the combination of Eqs. (215) and (216). 

Equation (215) shows that if the energy scale of interaction H{ nt is much weaker than the 
background energy Ho, operators u l and u\ , and hence the state vectors (213) evolve relatively slowly. 
Such an exclusion of fast background oscillations is very convenient for the perturbative approaches to 
complex interacting systems, in particular open quantum systems that weakly interact with their 
environment - see Sec. 7.6. 



4.7. Exercise problems 

4.1 . Calculate all possible binary products G/G, - (for j, j' = x, y, z) of the Pauli matrices (105), 



ro o 




^0 




f\ 0^ 










v 0 -l) 



and their commutators and anticommutators (defined similarly to those of the corresponding operators). 
Present the results using the Kronecker delta and Levi-Civita permutation symbols. 43 

4.2 . Calculate the following expressions, 

(i) (c • a)" , and then 

(ii) (bl + c-af, 

for the scalar product c a of the Pauli matrix vector a = n x a Y + n y o y + n z a z by an arbitrary c-number 
vector c, where n > 0 is an integer, and b is an arbitrary scalar c-number. 



43 See, e.g., MA Eq. (13.2). 
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Hint: For task (ii), you may use the binomial theorem MA Eq. (2.8), and than transform the 
result in a way enabling you to use the same theorem backwards. 



4.3 . Use result (i) of Problem 2 to simplify the following operator: exp{- i6n ■ a}, where n is a c- 
number vector of unit length. 



4.4 . Use the results of Problem 2 to derive Eqs. (2.165)-(2.166) of the lecture notes for 
transparency T of a system of N similar, equidistant, delta- functional tunnel barriers. 



4.5 . Prove that the matrix trace of an arbitrary operator does not change at an arbitrary unitary 
transformation. 



4.6 . Is the ID scattering matrix S, defined by Eq. (2.133), unitary? What about the ID transfer 
matrix T defined by Eq. (2.134)? 



4.7 . Calculate (cr z ) in a quantum state with the following ket-vector: 

| a) = const x ( + 1 Vj + 1 ->) + 1 <-)), 

where (T, -l) and (— >, <— ) are eigenstates of the Pauli matrices a z and a x , respectively. 
Hint: Double-check whether the solution you are giving is general. 



4.8 . Find eigenvalues of the following matrix: 

A = a-G = a x o x + a y c y +a z c z , 

where a x , yz are real c-numbers (scalars), and a X:y , z are the Pauli matrices. Sketch the dependence of the 
eigenvalues on parameter a z , with a x and a y fixed. Compare the result with Fig. 2.29. 



4.9 . At t = 0, the spin of an electron, whose interaction with an external field is described by 
Hamiltonian 

H = a a = a x a x + a y a y + a z d 2 , 

(where a Xi y yZ are real and constant c-numbers, and o xyz are the operators that, in the z-basis, are 

represented by the Pauli matrices o x , y ,z), was in state T, one of two eigenstates of operator a z . Use the 
Schrodinger picture equations to calculate the time evolution of: 

(i) the ket-vector \aj of the system (in any stationary basis you like), 

(ii) the probabilities to find the system in states T and -l, and 
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(iii) the expectation values of all 3 spatial components (5^, etc.) of the spin vector operator 
S = (ft/2)d. 

Analyze and interpret the results for the particular case a y = a z = 0. 

4.10 . For the same system as in Problem 9, use the Heisenberg picture equations to calculate the 
time evolution of: 

(i) all three spatial components (S x , etc.) of the spin operator S H (t), 

(ii) the expectation values of the spin components. 

Compare the latter results with those of Problem 9. 

4.11 . For the same system as in Problems 9 and 10, calculate the matrix elements of operator <x z 
in the basis of eigenstates a\, ai. 

Hint: In contrast to the cited problems, the answer evidently does not depend on the initial 
conditions. 

4.12 . Prove the Bloch theorem given by either Eq. (3.107) or Eq. (3.108). 

Hint: Consider the translation operator T R , defined by the following result of its action on an 
arbitrary functional*): 

r R /(r) = /(r + R), 

where R is an arbitrary vector of the Bravais lattice (3.106). In particular, analyze the commutation 
properties of the operator, and apply them to an eigenfunction y/(r) of the stationary Schrodinger 
equation for a particle in a 3D periodic potential described by Eq. (3.105). 
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Chapter 5. Some Exactly Solvable Problems 

This describes several simplest but important applications of the bra-ket formalism, notably including a 
few wave-mechanics problems we have already started to discuss in Chapters 2 and 3. 



5.1. Two-level systems 

In the course of discussion of the bra-ket formalism in the last chapter, we have already 
considered several examples of how it works for electron's spin. We have seen, in particular, that in 
magnetic field the electron has eigenenergies (4.167), i.e. two energy levels. As will be shown later in 
the course, such two-energy-level picture is valid not only for electrons and other spin-/4 elementary 
particles (such as muons and neutrinos), but also may give a good approximation for other important 
quantum systems. For example, as was already mentioned in Chapter 2, two energy levels are sufficient 
for an approximate description of dynamics of two weakly coupled quantum wells (Sec. 2.6), and of 
level anticrossing in the weak-potential approximation of the band theory (Sec. 2.7). Such two-level 
systems (alternatively called "spin-/4-like" systems) are nowadays the focus of additional attention in 
the view of prospects of their possible use for information processing and encryption. (In the last 
context, to be discussed in Sec. 8.5, a two-level system is usually called a qubit.) 

This is why before proceeding to other problems, let us summarize in brief what we have already 
learned about properties and dynamics of two-level systems, in a more convenient language. According 
to the general Eq. (4.6), a ket- (or bra-) vector of an arbitrary pure (coherent) state a of such a system 
may be presented, at any instant, as a linear combination of two basis vectors, for example 

| a) = a^T^ + a^Nl^ , (5.1) 

and hence is completely described by two complex coefficients (c -numbers) - say, at and ai. These two 
numbers are not completely arbitrary; they are restricted by the normalization condition. If the basis 
vectors are normalized, then to have the system in some basis state with a 100% probability, we need 

W % ={a\a) = (/T|a t +/4'|a L j(ar^ I + «^ 1 4<\) = ar^ ar t + cu^ ar^ = |cu^ | + |a^| =1. (5.2) 

This requirement is automatically satisfied if we take the moduli of at and aj, equal to the sine and 
cosine of the same (real) angle. Thus we can write, for example, 

0 iy ■ @ i(y+<p) /c i\ 

a t =cos — e' , a^sin — e K/ * '. (5.3) 

Moreover, according to the general Eq. (4.125), if we deal with just one system, 1 the common phase 
factor exp{iy} is unimportant for calculation of any expectation values, and we can take y = 0, so that 
Eq. (3) is reduced to 



1 To recall why this condition is crucial, please revisit the beginning of Sec. 2.3. Note also that, in particular, the 
mutual phase shifts between different qubits are very important for quantum information processing (see Chapter 
7 below), so that most discussions of these applications have to start from Eq. (3) rather than Eq. (4). 



© 2013 K. Likharev 



Open online access under cc bv-nc-sa license 



Essential Graduate Physics 



QM: Quantum Mechanics 




(5.4) 



Bloch 
sphere 

representation 
of state 



The reason why the argument of sine and cosine functions is usually taken in the form 612, 
becomes clear from Fig. la: Eq. (4) conveniently maps each state a on a certain representation point of 
a unit-radius Bloch sphere, 1 with polar angle 6 and azimuthal angle <p. In particular, state T (with a\ = 1 
and ai = 0) corresponds to the North Pole of the sphere (#=0), while state -l (with at = 0 and ai = 1), 
to its South pole (0= 7f). 3 Similarly, states — > and <— , described by Eqs. (4.122), i.e. having a\ = 1/V2 
and ai = +1A/2, correspond to points with 6 = nil and to, respectively, q> = 0 and q>= n. Two more 
special points (denoted in Fig. la as O and ®) are also located on sphere's equator (at 6= nil and cp = 
+7r/2); it is easy to check that they correspond to the eigenstates of matrix o y (in the same z-basis). 

In order to understand why such mutually perpendicular location of these three special point 
pairs on the Bloch sphere is not occasional, let us plug Eqs. (4) into Eqs. (4.13 1)-(4. 133) for the 
expectation values of spin components. The result is 



ft I \ ft ft 

S x ) = — sin 6* cos ^, (S y j = — sin^sin^, (S z ) = —cos&, 



(5.5) 



showing that the radius-vector of the representation point on the sphere is (after multiplication by hIT) 
just the expectation value of the spin vector S. 




Fig. 5.1. Bloch sphere: (a) notation, and presentation of spin precession in magnetic fields directed 
along: (b) axis z, and (c) axis x. 



Now let us see how does the representation point moves in various cases. First of all, according 
to Eqs. (4. 1 57)-(4. 158), in the absence of an external field (when the Hamiltonian operator is equal to 
zero and hence the time-evolution operator is constant) the point does not move at all. Now, if we apply 
to an electron a magnetic field directed along axis z, then, according to Eqs. (4.202), the Heisenberg 
operator of S z (and hence the expectation value <5* z )) remains constant, while angle q> in Eq. (5) evolves 



2 Named after the same F. Bloch who has pioneered the energy band theory that was discussed in Chapters 2-3. 

3 In the quantum information literature, ket-vectors [T) and \V) of these two states of a qubit are usually denoted as 
|1) ("quantum one") and |0) ("quantum zero"). 
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in time as Qt + const. This means that the torque-induced precession of the spin in a constant field 3 = 
n z 3 is described by a circular rotation of the representation point about axis z (in Fig. lb, in the 

horizontal plane) with the classical precession frequency Q. This is essentially the classical picture of 
rotation of the angular momentum vector about the precession axis z, with both its length and the z- 
component conserved. 4 

It is straightforward to repeat all calculations of Sec. 4.6 for a field of a different orientation and 
prove the (virtually evident) result that the representation point performs a similar rotation about the 
field direction. (Fig. lc shows such rotation for anx-directed field.) Finally, note that it is sufficient to 
turn off the field to stop the precession instantly. (Since Eq. (4.158) is the first-order differential 
equation, the representation point has no effective inertia. 5 ) Hence changing the direction and magnitude 
of the external field, it is possible to move spin's representation point to any position on the Bloch 
sphere. (In Chapter 6 we will examine another method of manipulating the point position, that is based 
on external rf field and is more convenient for some two-level systems.) 

In the context of quantum information, this means that in the absence of uncontrollable 
interaction with environment, it is possible to prepare a qubit in any pure quantum state, and then keep it 
unchanged. From here it is clear that a qubit is very much different from and a classical bistable system 
used to store single bits of information - such as the voltage state of a usual SRAM cell (a positive- 
feedback loop of two transistor-based inverters). As Eq. (4) shows, qubit's state is determined by two 
independent, continuous parameters 6 and <p, so it may store much more information than one bit. (The 
difference is even more spectacular in qubit systems, to be discussed in Sec. 8.5.) However, classical 
bistable systems, due to their nonlinearity, are stable with respect to small perturbations, so that their 
operation is rather robust with respect to unintentional interaction with their environment. In contrast, 
qubit's state may be readily disturbed (i.e. its representation point on the Bloch sphere shifted) by even 
minor perturbations, and does not have an internal state stabilization mechanism. 6 Due to this reason, 
qubit-based systems are rather vulnerable to environment-induced drifts, including dephasing and 
relaxation effects, which will be discussed in Chapter 7. 

5.2. Revisiting wave mechanics 

In order to use the bra-ket formalism for the description of the "orbital" motion of a particle as a 
whole, we have to either rewrite or even modify some of its formulas for the case of observables with 
continuous spectrum of eigenvalues. (One example we already know well are the momentum and kinetic 
energy of a free particle.) In that case, all the above expressions for states, their bra- and ket-vectors, and 
eigenvalues, should be stripped of discrete indices, like index j in the key equation (68) that determines 
eigenstates and eigenvalues of observable A. For that, Eq. (68) may be rewritten in the form 



4 Still, it is crucial to appreciate the difference between the expectation values (5), i.e. c-numbers, and the actual 
observables S x , S v , and S z which are described in quantum mechanics by operators. For example, according to Eq. 
(4.156), for any position on the Bloch sphere, it is impossible to have exact values of Cartesian components, as it 
is in the classical picture. 

5 The same is true for the angular momentum L at the classical torque-induced precession - see, e.g., CM Sec.6.5 
and in particular Eq. (6.71). 

6 In this aspect as well, the information processing systems based on qubits are closer to classical analog 
computers rather then classical digital ones. 
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Ala, 



A\a A 



(5.6) 



More essentially, all sums over such continuous eigenstate sets should be replaced by integrals. 
For example, for a full and orthonormal set of eigenstates (6), the closure relation (4.44) should be 
replaced with 



^dA\a j 



(5.7) 



where the integral should be taken over the whole interval of possible values of observable A. Applying 
this relation to the ket-vector of an arbitrary state a (generally, not an eigenstate of operator A ), we get 



a 



l\a 



\dA 



a A \a 



\dA 



a A \a)\a 



(5.8) 



S(A-A'); 



(5.9) 



this formula replaces the orthonormality condition (4.38). 



According to Eq. (8), in the continuous case the bra-ket (aA \a) still plays the role of the 
coefficient whose modulus squared determines state a^'s probability - see the last form of Eq. (4.120). 
However, in the continuous spectrum case the probability to find the system exactly in a particular state 
is infinitesimal. Instead we should speak about the probability density w(A) oc | (aA \a) I 2 to find the 
observable within a small interval dA about a certain value A. The coefficient in that relation may be 
found by making the similar change from summation to integration (without any additional coefficients) 
in the normalization condition (4.121): 



^dA(a\a A )(a A \a) = 1. (5.10) 
Since the total probability of the system to be in some state should also equal j w{A)dA , this means that 

(5.11) 



w(A) = (a\a 



a\a 



{A) = \w(A)AdA, 
which is just the evident continuous version of Eq. (1.37), we get 

(A) = | (a | a A )A(a A \ a)dA. 
Presenting this expression as a double integral, 

(A) = \ dA\ dA '(a | a A )A8(A - A ')(a A , \ a 



(5.14) 



Continuous 
spectrum: 
closure 
relation 



This integral replaces sum (4.37) for the representation of an arbitrary ket-vector as an expansion over 
eigenstates of an operator. For the particular case when \ a) = \ua) , this relation requires 7 



Continuous 
spectrum: 
state ortho- 
normality 



Now let us see how we can calculate expectation values of continuous observables, i.e. their 
ensemble averages. If we speak about the same observable A whose eigenstates are used as the basis (or 
any compatible observable), everything is simple. Inserting Eq. (11) into the general statistical relation 

(5.12) 



(5.13) 



Continuous 
spectrum: 
probability 
density 



7 Notice that in the contrast to the discrete spectrum case, the dimensionality of the bra- and ket-vectors so 
normalized is different from 1 . 
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Expectation 
value 



and using the continuous-spectrum version of Eq. (4.100), 

(a A \A\a A ,) = A8(A-A'), (5.15) 



we may write 



A} = jdA^dA'(a\a A )(a A \A\a A ,}(a A , \a) = (a\A\a 



(5.16) 



so that Eq. (4.125) remains valid in the continuous-spectrum case without any changes. 



The situation is a bit more complicated for the expectation values of operators that do not 
commute with the base-creating operator, because the matrix of such an operators in that may not be 
diagonal. We will consider (and overcome :-) this technical difficulty very soon, but otherwise we are 
ready for the discussion of wave mechanics. (For the notation simplicity I will discuss its ID version; 
the generalization to the 2D and 3D cases is straightforward.) 

Let us consider what is called the coordinate representation, postulating the (intuitively almost 
evident) existence of a quantum state basis, whose with ket-vectors will be called |x), corresponding to a 
certain, exactly defined value x of particle's coordinate. Writing the following evident identity: 



x|x) = x|x), (5.17) 

and comparing this relation with Eq. (6), we see that they do not contradict each other if we assume that 
x in the left-hand part of this equation is considered as the coordinate operator x whose action on a ket- 
(or bra-) vector is just its multiplication by c-number x. (This looks like a proof, but is actually a 
separate, independent postulate, no matter how plausible.) 

In this coordinate representation, the inner product (flu|a(0) becomes (x\a(t)), and Eq. (11) takes 
the form 



* 



w(x,t) = (a(t)\x){x\a(t)) = {x\a(t)) (x\a(t)). (5.18) 

Comparing this formula with the basic postulate (1.22) of wave mechanics, we see that they coincide if 
the Schrodinger's wavefunction of time-evolving state a(t) is identified with that bra-ket: 8 



Wave- 
function 
as inner 
product 



x ¥ a (x,t) = lx\a(t) 



(5.19) 



This key formula provides the connection between the bra-ket formalism and wave mechanics, 
and should not be too surprising for the (thoughtful :-) reader. Indeed, Eqs. (4.45) shows that any inner 
product of vectors describing two states is a measure of their coincidence - just as the scalar product of 
two geometric vectors. (The orthonormality condition (4.38) is a particular manifestation of this fact.) In 
this language, value (19) of wavefunction VP a at point x and moment t characterizes "how much of a 
particular coordinate x" does the state a contain at that particular instance. (Of course this informal 
language is too crude to describe the fact that ^^x, t) is a complex function, which has not only a 
modulus, but also a phase.) 



8 I do not quite like expressions like (x|T) used in some papers and even textbooks. Of course, one is free to 
replace a with any other letter including) to denote a quantum state, but then it is better not to use the same 
letter to denote the wavefunction, i.e. an inner product of two state vectors, to avoid confusion. 
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Let us rewrite the most important formulas of the bra-ket formalism (so far, in the Schrodinger 
picture) in the wave mechanics notation. In particular, let us use Eq. (19) to calculate the (partial) time 
derivative of the wave function, multiplied by the usual coefficient ih: 

dW 8 

ih — s- = ih— (x\a(t)\ (5.20) 
dt dt 

Since the coordinate operator x does not depend on time explicitly, its eigenstates x are stationary, and 
we can swap the time derivative and the time-independent ket-vector and hence <x|. Making use of the 
Schrodinger-picture equations (4.157) and (4.158), and then inserting the identity operator in the 
continuous form (7) of the closure relation, written for the coordinate eigenstates, 



\dx'\x'){x'\ = I , (5.21) 

we may continue to develop the right-hand part of Eq. (20) as 

8 8 

x\ih—\a(t)) = (x\ih—u(t,t 0 )\a(t 0 )) = (x\Hu(t,t 0 )\a(t 0 )) = (x\A\a(t)) 

'at , dt 1 (5-22) 



= jdx'(x\H\x')(x'\a(t))= \dx' (x\H\x')9 a (x' ,f). 



For a general Hamiltonian operator, we have to stop here, because if it does not commute with 
the coordinate operator, its matrix in the x-basis is not diagonal, and integral (22) cannot be worked out 
explicitly. However, there exists a broad set of space-local operators 9 whose arguments include just one 
value of the spatial coordinate, for which we can move ket-vector (jc| to the right 10 



x\A\x')^{x',t) = AW(x',t)(x\x') = A^(x,t)S(x-x') 



(5.23) 



Space- 
local 

operators 



where operator A in the last two forms should be understood as its coordinate representation that is 
defined by Eq. (23) - if it is valid for a particular operator. For example, consider the ID version of 
operator (1.40), 

H = ^ + U(x,t), (5.24) 
2m 

which was the basis of all our discussions in Chapter 2. Its potential-energy part commutes with 
operator x , so its matrix in the x-basis is diagonal, meaning that this part of Hamiltonian (24) is clearly 
local, with its coordinate representation given merely by the c-number function U(x,i). The situation 
with the kinetic energy, which is a function of momentum operator p x , not commuting with x , is less 

evident. Let me show that this operator is also local, and in the same shot derive (the ID version of) Eq. 
(1.26), if we postulate the commutation relation (2.14): 

xp x - p x x = ihl . (5.25) 



9 Of all the operators we will encounter in this course, only the statistical operator w is substantially non-local - 
see Sec. 7.2. 

10 In the second equality, I have use Eq. (9) for variable x. 



Chapter 5 



Page 6 of 46 



Essential Graduate Physics 



QM: Quantum Mechanics 



For that, let us consider the following matrix element, (x\xp x - p x x\x'). On one hand, we may 
use Eq. (25) to write 

(x\xp x - p x x\x'} = (x\ihl\x') = ih(x\x'} = ihS(x-x') . (5.26) 
On the other hand, since x|x'} = x'\x') and (x|x = (x|x , we can write 



(* \*Px ~ PA x') = (x\xp x -p x x'\x') = (x- x')(x \p x \x'). (5.27) 
Comparing Eqs. (26) and (27), we may write 

x\p x \x') = ih S ( X ~ X ? . (5.28a) 
(x-x'j 



Thus p x is a local operator. Since Eq. (28a) may be rewritten as 1 



d 

x\p\x') = -ih — £>(x-x'), (5.28b) 

dx 

its comparison with Eq. (23) shows that the formula used so much in Chapter 2, 

p x =-ih^, (5.29) 

ox 

is indeed valid, but only for the coordinate representation of the momentum operator. (Later in this 
section we will see that in an alternative, momentum representation, this operator looks completely 
differently.) 

It is straightforward to show (and virtually evident) that any operator / (p) is local as well, with 
its coordinate representation being 



/ 



■ihj-Y (5.30) 



In particular, this pertains to the kinetic energy operator in Eq. (24), so that Eq. (20) is reduced to the 

Schrodinger equation in its familiar wave-mechanics form (1.28), if by Hwe mean its coordinate 
representation: 



2m 



■ih — 

dx 



+ U(x,t) = -^-f Y + U(x,t). (5.31) 
2m ox 



Now let us return, as was promised, to operators that do not commute with operator x, and 
hence do not have to share its continuous spectrum. Inner-multiplying both parts of the general Eq. 
(4.68) by ket-vector (x|, and inserting into the left-hand part the identity operator in form (21), we get 



J dx '(x\A\x')(x'\ a j ) = A j (x | a j ) , (5.32) 



11 The equivalence of the two forms of Eq. (28) may be readily proven, for example, by comparison of their effect 
on any differentiable function J[x, x'), using its Taylor expansion over argument x at point x' = x - a simple but 
good exercise for the reader. 
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i.e., using the wavefunction definition (19), 

jdjd(x\A\x')W.(x',t) = Aj*¥j(x,t) . (5.33) 
If the operator A is space-local, i.e. satisfies Eq. (23), then this result is immediately reduced to 



AW j (x,t) = A j V J (x,t), 



Operator's 
fc ia\ eigenstates 
p. 34) and 

eigenvalues 



(where the left-hand part implies the coordinate representation of the operator), even if the operator does 
not commute with operator x . n The most important case of this coordinate-representation form of the 
eigenproblem (4.68) is the familiar Eq. (1.60) for eigenvalues E n of energy. Hence, the energy spectrum 
of a system (that, as we know very well from the first chapters of the course, may be discrete) is nothing 
more than the set of eigenvalues of its Hamiltonian operator - a very important conclusion indeed. 

The operator locality also simplifies the expression for its expectation value. Indeed, plugging 
the completeness relation in the form (21) into the general Eq. (4.125) twice (written in the first case for 
x and in the second case for x \ we get 

(A) = jdxjdx'(a(t)\x)(x\A\x'){x'\a(t)) = \ dxj dx^l(x,t)(x\A\x')^ a (x' ,t) . (5.35) 

Now, Eq. (23) reduces this result to just 

(A) = \ dxj dx T* (x, t)A x ¥ a (x, t)S{x - x') = j" (x, t)A x ¥ a (x, t)dx . (5.36) 

i.e. to Eq. (1.23), which we had to postulate in Chapter 1. 

So, we have essentially derived all basic relations of wave mechanics from the bra-ket 
formalism, which will also allow us to get some important new results in that area. Before doing that, let 
us have a look at a pair of very interesting relations, together called the Ehrenfest theorem. In order to 
derive them, let us calculate the following commutator: 13 

[x,p 2 x \=xp s p x - p x p x x. (5.37) 

Rewriting Heisenberg's commutation relation (25) as 

xp x = p x x + ih, (5.38) 

we can use it twice in the first term of the right-hand part of Eq. (37) to sequentially move the 
momentum operators to the left: 

W X P X = (p x x + ifi)p x = p x xp x + ihp x = p x (p x x + ih) + ihp x = p x p x x + 2ihp x . (5.39) 



12 In some systems of quantum mechanics postulates, the Schrodinger equation (1.28) itself is considered as a sort 
of eigenstate/eigenvalue problem (34) for operator ihdldt. Notice that such construct is very close to that of the 
momentum operator -itidldx, and similar arguments may be given for both expressions, starting from the 
invariance of the quantum state of a free particle with respect to translations in time and space, respectively. 

13 It is not important whether we speak about the Schrodinger or Heisenberg picture here. Indeed, if three 

operators in the former picture are related as [A S ,B S ] = C s , then according to Eq. (4.190), in the latter picture 



u^a h u,u^b h u 



aJju^ BJJ - 



■U^ B a UU^ aJj 



U- 
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The first term of the result cancels with the second term of Eq. (37), so that the commutator is rather 
simple: 



[x,p 2 x ]=2ihp x . 



(5.40) 



Let us use this equality to calculate the Heisenberg-picture equation of motion for operator x , 
applying the general Heisenberg equation (4.199) to the orbital motion, when the Hamiltonian has the 
form (3 1), with time-independent potential U(x): H 



dx _ 1 
dt ih 



x,H 



1 

m 



* P 2 

x,^ + U(x) 
2m 



(5.41) 



The potential energy operator commutes with the coordinate operator. Thus, the right-hand part of Eq. 
(41) is proportional to commutator (40): 



Heisenberg 
equation 
for 

coordinate 



dx 


Px_ 


dt 


m 



(5.42) 



In that operator equality, we readily recognize the classical relation between particle's momentum and 
is velocity. 



Now let us see what does a similar procedure give for the momentum's derivative: 

dp x 1 



dt ih 



1_ 

iti 



2m 



(5.43) 



The kinetic energy operator commutes with the momentum operator, and hence may be dropped from 
the right-hand part of this equation. In order to calculate the remaining commutator of the momentum 
and potential energy, let us use the fact that any smooth potential profile may be represented by its 
Taylor expansion: 



1 d'U 



(5.44) 



fok\ dx* 

where the derivatives of U should be understood as c-numbers (evaluated at x = 0), so that we may write 



\p x ,U{x)\ = ^-—-[p x ,x j = 2. 



k=0 k\ dx* """" J Uk\ dx k 



p x xx^jc - xx^xp x 

\ k times k times J 



(5.45) 



Applying Eq. (38) k times to the last term in the parentheses, exactly as we did it in Eq. (39), we get 

1 d'U . 
H(k-l)\ dx k 



[p x ,U(x)] = -Y^— ^-^ikhx k 1 



i k\ vx 



(5.46) 



But the last sum is just the Taylor expansion of the derivative dU/dx. Indeed, 



8U 



-=z- 



i d k 



dx fr'o k'l dx k ' 



\ dx j 



k'=0 



1 d K+l U Kl 

77-rx' 

k'l dx k+1 



1 d k U K 



k-i 



£(k-iy. dx 



(5.47) 



14 Since this Hamiltonian is time-independent, we may replace the partial derivative over time t with the full one. 
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where at the last step I have replaced the notation of the summation index from k ' to k - 1 . As a result, 
Eq. (43) yields: 



(5.48) 



This equation again coincides with the classical equation of motion! Discussing spin dynamics in 
Sec. 4.6 and 5.1, we have already seen that this is very typical of the Heisenberg picture. Moreover, 
averaging Eqs. (42) and (48) over the initial state (as Eq. (4.191) prescribes 15 ), we get similar results for 
the expectation values: 16 




d(x) 


.(p.) 






dt 


m 


dt 


-(f)' 



(5.49) 



Heisenberg 

equation 

for 

momentum 



Ehrenfest 
theorem 



However, it is important to remember that the equivalence between these quantum-mechanical 
equations and similar equations of classical mechanics is superficial, and the degree of the similarity 
between the two mechanics very much depends on the problem. As one extreme, let us consider the case 
when a particle's state, at any moment between to and t, may be accurately represented by one, relatively 
narrow wave packet. Then we may interpret Eqs. (49) as equations of essentially classical motion for the 
wave packet's center, in accordance with the correspondence principle. However, even in this case it is 
important to remember about the purely quantum mechanical effects of nonvanishing wave packet width 
and its spreading in time, which were discussed in Sec. 2.2. 

In the opposite extreme, Eqs. (49), though valid, may tell almost nothing about system's 
dynamics. Maybe the most apparent example is the "leaky" quantum well that was discussed in Sec. 2.5 
- see Fig. 2.18 and its discussion. Since both the potential U(x) and the initial state are symmetric 
relative to point x = 0, the right-hand parts of both Eqs. (49) identically equal zero. Of course, the result 
(that average values of both momentum and coordinate stay equal zero at all times) is correct, but it does 
not tell us too much about the rich dynamics of the system (the finite lifetime of the metastable state, the 
formation of two wave packets, their waveform and propagation speed), and about the important insight 
the solution gives for the quantum measurement theory. Another similar example is the band theory 
(Sec. 2.7), with its purely quantum effect of the allowed energy bands and forbidden gaps, of which Eq. 
(49) gives no clue. 

To summarize, the Ehrenfest theorem is important as an illustration of the correspondence 
principle, but its predictive power should not be exaggerated. 

Now we are ready to patch some holes left during our studies of wave mechanics in Chapters 1- 
3. First of all, I have promised you to develop a more balanced view at the monochromatic de Broglie 
waves (4.1), which would be more respectful to the evident r <-> p symmetry of the coordinate and 
momentum. Let us do this for the ID case when the wave may be presented as 17 



15 Indeed, acting exactly as at derivation of Eq. (36), for a space-local Heisenberg operator we get 

(A)(t) = j (x, t 0 )A H (t, t 0 y¥(x, t 0 )dx . 

16 The set of equations (49) constitute the Ehrenfest theorem. 

17 From this point on, for the sake of brevity I will drop index x in the notation of the momentum - just as it was 
done in Chapter 2. 
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y/ (x) = a exp\i — >, forall-co < x < +00. (5.50) 
I h } 

Let us have a good look at this function. Since it satisfies equation (34) for the ID momentum operator 

p = -ifid I dx , 

PW P =PW P , ' (5-51) 

y/ p is an eigenfunction of the momentum operator. But this means that we can also write Eq. (6) for the 
corresponding ket-vector: 

P\p) = p\p), (5-52) 

and according to Eq. (19) the wavefunction may be presented as 

¥ P (x) = (x\p). (5.53) 

Expression (53) is quite remarkable in its x <-> p symmetry - which may be pursued further on. 
Before doing that, however, we have to discuss normalization of such functions. Indeed, in this case, the 
probability density w(x) (18) is constant, so that its integral 



I w(x)dx = J y/ p (x)y/ p (x)dx (5.54) 



diverges if a p ^ 0. Earlier in the course, we discussed two ways to avoid this divergence. One is to use a 
very large but finite integration volume - see Eq. (1.31). Another way to avoid the divergence is to form 
a wave packet of the type (2.20), possibly of a very large length and very narrow spread of momenta p. 
Then integral (54) may be required to equal 1 without any conceptual problem. 

However, both these methods violate the x <-» p symmetry, and hence are inconvenient for our 
current purposes. Instead, let us continue to identify the bra- and ket-vectors (a A \ and \a A ) of the general 
theory, developed in the beginning of this section, with eigenvectors ip\ and \p) of momentum - just as 
we have already done in Eq. (52). Then the normalization condition (9) becomes 

(p\p') = S(p-p'). (5.55) 

Inserting the identity operator in the form (21) (with the integration variable x' replaced by x) into the 
left-hand side of this equation, we can translate this normalization rule to the wavefunction language: 

^dx(p\x)(x\p') = j dxy/ p (x)i// p ,(x) = S(p- p'). (5.56) 

Now using Eq. (50), this requirement turns into the following condition: 

* +t f I ( D — X) 1 1 1 2 

a p a p , I exp< i >dx= \a \ 27ih8{p- p') = S(p- p'), (5.57) 

-co t ^ J 

so that, finally, a p = exp{z'^}/(2^) 1/2 , where <j> is an arbitrary (real) phase, and Eq. (50) becomes 18 



18 Repeating the calculation for each Cartesian component of a plane monochromatic wave of arbitrary 
dimensionality d, we get y/ p = (27rh)' d,2 exp{i(p-r/h + <p)}. 
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¥ D (x) = 



(2xti) 



Tjexp U 



px 



+ </> 



(5.58) 



As was mentioned above, for finite-length wave packets such normalization is not really 
necessary. However, frequently it makes sense to keep the pre-exponential coefficient in Eq. (58) even 
for wave packets, because of the following reason. Let us form a wave packet of the type (2.20), based 
on wavefunctions (58) - taking <p = 0 for the notation brevity, because it may be incorporated into 
function (pip): 



(5.59) 



From the mathematical point of view, this is just the equation of a ID Fourier spatial transform, and its 
reciprocal is 




Wave 
packet 
in reciprocal 
represen- 
tations 




(5.60) 



These expressions are completely symmetrical, and present the same wave packet; this is why functions 
ip(x) and (p{p) are frequently called, respectively, the coordinate (x-) and momentum (p-) representations 
of the (same) state of the particle. Using Eqs. (53) and (58), they may be presented in an even more 
manifestly symmetric form, 



y/(x) = ^(p(p)(x\p)dp, (p{p) = \y/(x)(p\x)dx , 
in which the scalar products satisfy the basic postulate (4.14) of the bra-ket formalism: 



p\x 



(ink) 



1 J .px 

~ p i't 



x\p 



(5.61) 



(5.62) 



We already know that in the x-representation, i.e. in the usual wave mechanics, the coordinate 
operator x is reduced to the multiplication by x, and the momentum operator is proportional to a 
derivative over x: 





X 


in*=*. P 


in x 


dx 


(5.63) 


It is natural to guess that in the /^-representation, the expressions for operators would be reciprocal: 




X 


1 op 


in p P' 


(5.64) 



Momentum 
and coordinate 
operators 
in reciprocal 
represen- 
tations 



with the difference in one sign only, due to the opposite signs of the Fourier exponents in Eqs. (59) and 
(60). The proof of Eqs. (64) is straightforward; for example, acting by the momentum operator to 
wavefunction (59), we get 



3 1 



•» 3 S-p x \\, 

in — exp<z — > dp 
dx { h J J 



(5.65) 
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and similarly for operator x acting on function qip). Hence, the action of the operators (63) on 
wavefunction y/ (i.e. state's x-representation) gives the same results as the action of operators (64) on 
function q> (i.e. its ^-representation). 

It is interesting to have one more, different look at this coordinate-to-momentum duality. For 
that, notice that according to Eqs. (4.82)-(4.84), we may consider the bra-ket (x\p) as an element of the 
(infinite-size) matrix U xp of the unitary transform from the x-basis to />-basis. Now let us derive the 
operator transform rule that would be a continuous version of Eq. (4.92). Say, we want to calculate a 
matrix element of some operator in the ^-representation: 

(p\A\p'). (5.66) 

Inserting two identity operators (21) into this bra-ket, and then using Eq. (53) and its complex conjugate, 
and also Eq. (23) (again, valid only for space-local operators!), we get 



= jjxj" dx'(p\x)(x\A\x')(x'\p') = Jdxj" dx'\//*(x)(x\A\x')i// p ,(x') 

1 r , r . , | .px\ x , „;> l.p'x') 1 r, f ./wU l.p'x\ (5 - 67) 
= I dx\ ax exp<-z — >o(x-x jAexpU > = I dxQxp<-i — >Aexp<i >. 

2y tfo L J L J I ~fo J L J 

For example, for the momentum operator itself, this relation yields: 

P ^' ^ ^ = 2^ -C ^ eXP |" ' " ^ ^1 CXP i = ^ 7 ex p| ' = ^ '^(^ ' - (5-68) 



dx 



ti 27th J ti 



Due to Eq. (52), this result is equivalent to the second of Eqs. (64). 

A natural question arises: why is the momentum representation used much less frequently than 
the coordinate representation - i.e., the wave mechanics? The answer is purely practical: besides the 
special case of the harmonic oscillator (to be revisited in Sec. 4 below), the orbital motion Hamiltonian 
(31) is not x <-> p symmetric, with the potential energy U{x) being typically a more complex function 
than the kinetic energy, which is quadratic in momentum. Because of that, it is easier for problem 
solution to keep the potential energy operator just a wavefunction multiplier, as it is in the coordinate 
representation. 

The most significant exception of this rule is the motion in a periodic potential, especially in the 
presence of additional external force F{t), which may result in the effects discussed in Sees. 2.8 and 2.9 
(the Bloch oscillations, Landau-Zener tunneling etc.). Indeed, in this case the dispersion relation E n {q), 
typically rather involved, plays the role of the effective kinetic energy, while the effective potential 
energy U e f = -F{t)x in the field of the additional force is a simple function of x. This is why discussions 
of the listed and more complex issues of the band theory (such as quasiparticle scattering, mobility, 
excitation, etc., usually discussed in solid state physics courses), are usually based on the momentum 
representation. 



5.3. Feynman's path integrals 

As has been already mentioned, even within the realm of wave mechanics, the bra-ket language 
allows to streamline some calculations that would be very bulky using the notation used in Chapters 1-3. 
Probably the best example in the famous alternative, path integral formulation of quantum mechanics, 
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developed in 1948 by R. Feynman. 19 I will review this important concept - admittedly cutting one math 
corner for brevity. 20 (This shortcut will be clearly marked.) 

Let us inner-multiply both parts of Eq. (4.157), which is essentially the definition of the time- 
evolution operator, by the bra-vector of state x, 

{x\a(t)) = (x\u(t,t 0 )\a(t 0 )), (5.69) 

insert the identity operator before the ket-vector in the right-hand part, and then use the closure 
condition in the form of Eq. (21), with x ' replaced with xo~. 

(x\a{t)) = j dx 0 (x\u (t,t 0 )\x Q }(x 0 \ a{t 0 )). (5.70) 

According to Eq. (19), this equality may be presented as 

*¥ a (x, t) = \ dx 0 (x | u (t, t 0 )\x 0 )¥ a (x 0 ,t 0 ). (5.71) 

Comparing this expression with Eq. (2.44), we see that the bra-ket in this relation is nothing else than 
the ID propagator, which was discussed in Sec. 2.2: 

(x\u(t,t 0 )\x 0 ) = G(x,t;x 0 ,t 0 ) . (5.72) 

As a reminder, we have already calculated the propagator for a free particle - see Eq. (2.49). 

Now let us break the time segment [to, t] into N (for the time being, not necessarily equal) parts 
by inserting (N- 1) intermediate points (Fig. 2) 

t 0 <t i <...<t k <...<t N _ x <t, (5.73) 
and rewrite the time evolution operator in the form 

u(t,t 0 ) = u (t, t N _ x )u (t N _ x , t N _ 2 )...u (t 2 , t l )u (t l ,t 0 ), (5 .74) 

whose correctness is evident from the very definition (4.157) of the operator. Plugging Eq. (74) into Eq. 
(72), let us insert the identity operator, again in the form (21) but written for xt rather than x', between 
each two partial evolution operators including time argument tk. The result is 

G(x,t,x 0 t Q ) = ^ dx N _ { ^ dx N _ 2 ...^ dx l (t, ^-i))^^-! ){ x n-i |^ (^am ' ^a>-2 )| x n-2 )---{ x i \u (^i ' ^o)|' c o)- (5.75) 



x 0 * 



x, 



X N-2 A X N-l 



X ^ 



Fig. 5.2. Time partition and coordinate 
notation at the initial stage of the 
Feynman' s path integral derivation. 



l N-2 



l N-l 



19 According to Feynman' s memories, his work was motivated by a "mysterious" remark by P. A. M. Dirac in his 
pioneering 1930 textbook on quantum mechanics. 

20 For a more thorough discussion of the path-integral approach, see the famous text R. Feynman and A. Hibbs, 
Quantum Mechanics and Path Integrals first published in 1965. (For its latest edition by Dover in 2010, the book 
was emended by D. Styler.) For a more recent monograph that reviews more applications, see L. Schulman, 
Techniques and Applications of Path Integration, Wiley, 1981. 



Chapter 5 



Page 14 of 46 



Essential Graduate Physics 



QM: Quantum Mechanics 



The physical sense of each integration variable xt is the wavefunction's argument at time tt - see 
Fig. 2. The key Feynman's breakthrough was the realization that if all intervals are similar and 
sufficiently small, 4 - tk-i = dr — > 0, all the partial bra-kets participating in Eq. (75) may be readily 
expressed via Eq. (2.49), even if the particle is not free, but moves in a stationary potential profile U{x). 
To show that, let us use either Eq. (4.175) or Eq. (4.181), which, for a small time interval dr, give the 
same result: 



( "2 



u(t + dr,r) = exp< — Hdr > = exp< — 



— dr + U(x)dr 
2m 



(5.76) 



Generally, an exponent of a sum of two operators may be treated as that of c-number arguments, 
and in particular factored into a product of two exponents, only if the operators commute. (Indeed, in 
this case we can use all the standard algebra for exponents of c-number arguments.) In our case, this is 
not so, because operator p does not commute with x, and hence with U{x). However, it may be 

shown 21 that for an infinitesimal time interval dr, the nonvanishing commutator 



- 2 
P 

2m 



dx,U(x)dz 



*0, 



(5.77) 



proportional to (dr) , is so small that in the first approximation in dr its effects may be ignored. As a 
result, we may factor the right-hand part in Eq. (76) by writing 



u(r + dr,r) 



dr->0 



— > exp< - 



• - 2 

; p 

ft 2m 



dr > exp j - — U (x)d r 



(5.78) 



(This approximation is very much similar in spirit to the rectangle-formula approximation for a usual ID 
integral, which in also asymptotically impeachable.) 

Since the second exponential function in the right-hand part of Eq. (78) commutes with the 
coordinate operator, we can move it out of each partial bra-ket participating in Eq. (75), with U{x) 
turning into a c-number function: 



{x T+dl \u(t + dr, t)\ x t ) = (x I+dT | expj 



• ~ 2 

ft 2m 



dr >|x r )exp< -— U{x)dr 



ft 



(5.79) 



But the remaining bra-ket is just the propagator of a free particle, and we can use Eq. (2.49) for it: 



T+dz 



|exp K^ rfr r >= 



m 



1/2 



2niftdx 



exp< i - 



. m(dx) 2 



2ftdz 



(5.80) 



As the result, the full propagator (75) takes the form 
G(x,t;x 0 ; 0 ) = lim^ jdx N _ 1 jdx N _ 2 ..jdx l 



A<->co 





Nil 




( m 






exp< 


% 


ylnitidx j 





.m(dx) 2 .U(x) 
i 1 dx 



2ftdr 



ft 



.(5.81) 



A strict proof of this intuitively evident statement would take more space and time than I can afford. 
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At TV — » oo and hence dr = (t - to)/N — > 0, the sum under the exponent in this expression tends to an 
integral: 



y- 



m 


f dx^ 


2 




-U(x) 


2~ 


ydr j 



r-t dr 



h 



Ux^ 2 



\dr j 



-U(x) 



dr, 



(5.82) 



and the expression in square brackets is just the particle's Lagrangian function L. 22 The integral of the 
function over time is the classical action ^calculated along a particular "path" x(t). 23 As a result, 
defining the (ID) path integral as 



{(...)Z)[x(r)] = lim^ 



0 

7V->oo 



m 



, Nil 



Imhdx 



j" dXft_[ J" dx N _2 ..J" dxy (...), 



we can bring our result to a superficially simple form 




1 D path 

^ inte 9 ral: 
p.SJaj definition 



1D 

ft filM P r °P a 9 ator 
p.oJDj via path 

integral 



The name "path integral" for the mathematical construct (83a) may be readily explained if we 
keep the number TV of time intervals large but finite, and also approximate each of the enclosed integrals 
by a sum overM» 1 discrete points along the coordinate axis (Fig. 3a). 



M < 




N-l 




Fig. 5.3. Several ID classical 
paths in (a) the discrete 
approximation and (b) the 
continuous limit. 



Then the path integral is a product of (N - 1) sums corresponding to different values of time r, 
each of them with M terms, each of the terms representing the function under the integral at a particular 
spatial point. Multiplying those (N-l) sums, we get a sum of (N - Y)M terms, each evaluating the 
function at a specific spatial-temporal point [x, r]. These terms may be now grouped to represent all 
possible different continuous classical paths x[t] from the initial point [xo,to] to the finite point [x,t]. It is 
evident that the last interpretation remains true even in the continuous limit N, M — > oo - see Fig. 3b. 

Why does such representation of the sum has sense? This is because in the classical limit the 

22 

particle follows just a certain path, corresponding to the minimum of action S'. Hence, for all close 
trajectories, the difference (^ - S^i) is proportional to the square of the deviation from the classical 
trajectory. Hence, for a quasiclassical motion, with 5^ » h, there is a substantial bunch of close 
trajectories, with — ^i) « h, that give similar contributions to the path integral. On the other hand, 



22 See, e.g., CM Sec. 2.1. 

23 See, e.g., CM Sec. 9.2. 
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strongly non-classical trajectories, with (S 1 - ^1) » h, give phases Sth rapidly oscillating from one 
trajectory to the next one, and their contributions to the path integral are averaged out. 24 As a result, for 
the quasiclassical motion, the propagator's exponent may be evaluated on the classical path: 




-U(x) 



dz , 



(5.84) 



The sum of the kinetic and potential energies is the full energy E of the particle, that remains constant 
for motion in a stationary potential U{x), so that we may rewrite the expression under the integral as 25 



m 


f dx^ 


2 n 










-U(x) 


dr = 


m 




~2 


ydrj 






ydr) 



dr = m — dx ■ 
dr 



Edr. 



(5.85) 



With that replacement, Eq. (83b) yields 



G cl oc exp 



I? dx 7 
— m — dx 
h J dr 

x 0 



expj--£(*-* 0 n = exp 



x 

J p(x)dx 



x 0 



expj--£(f-f 0 )k (5.86) 



where p is the classical momentum of the particle. But (at least, leaving the pre-exponential factor alone) 
this is exactly the WKB approximation result that was derived and studied in detail in Chapter 2! 

One may question the value of a calculation that yields the results that could be readily obtained 
from Schrodinger's wave mechanics. The Feynman's approach, is indeed not used too often, but it has 
its merits. First, it has an important philosophical (and hence heuristic) value. Indeed, Eq. (83) may be 
interpreted by saying that the essence of quantum mechanics is the exploration, by the system, of all 
possible paths x(r), each of them classical-like in the sense that the particle's coordinate x and velocity 
dx/dr (and hence its momentum) are exactly defined simultaneously at each point. The resulting 
contributions to the path integral are added up coherently to form the final propagator G, and via it, the 
final probability Wcc \G\ 2 of the particle propagation from [xo,to] to [x,t]. Of course, as the scale of action 
(i.e. of the energy-by-time product) of the motion decreases and becomes comparable to h, more and 
more paths produce substantial contribution to this sum, and hence to W, ensuring a larger and larger 
difference between the quantum and classical properties of the system. 

Second, the path integral provides a justification for some simple explanations of quantum 
phenomena. A typical example is the quantum interference effects discussed in Sec. 3.1 - see, e.g., Fig. 
3.1 and the corresponding text. At that discussion, we used the Huygens principle to argue that at the 
two-slit interference, the WKB approximation might be restricted of effects by two paths that pass 
through different slits, but otherwise consisting of straight-line segments. To have another look at that 
assumption, let us generalize the path integral to multi-dimensional geometries. Fortunately, the simple 
structure of Eq. (83b) makes such generalization virtually evident: 



24 This fact may be proved by expanding the difference (S- -5y) in the Taylor series in path variations (leaving 
only the leading quadratic terms) and working out the resulting Gaussian integrals. It is interesting that the 
integration, together with the pre-exponential coefficient in Eq. (83a), gives exactly the pre-exponential 
factor that we have already found in Sec. 2.4 when refining the WKB approximation. 

25 The same trick is often used in analytical classical mechanics - say, for proving the Hamilton principle, and for 
the derivation of the Hamilton - Jacobi equations (see, e.g. CM Sees. 10.3-4). 
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G(r,t;r 0 t 0 ) = J expj 


i — r , . i 


}^[r(r)], 


<o V " ' 7 <0 


mfdr^ 

2 \ dr j 


2 

-C/(r) 


dr. 



3D 

(5.87) propagator 
via the path 
integral 



where definition (83a) of the path integral should be also modified correspondingly. (I will not go into 
these technical details.) For the Young-type experiment (Fig. 3.1), where a classical particle could reach 
the detector only after passing through one of the slits, the classical paths are the straight-line segments 
shown in Fig. 3.1, and if they are much longer than the de Broglie wavelength, the propagator may be 
well approximated by the sum of two integrals ofLdr= ip(r)-dr/ fi - as it was done in Sec. 3.1. 

Last but not least, the path integral allows simple solutions of some problems that would be hard 
to get by other methods. As the simplest example, let us consider the problem of tunneling in multi- 
dimensional space, sketched in Fig. 4 for the 2D case - just for graphics' simplicity. Here, potential U(x, 
y) has the "saddle" shape. (Another helpful image is a mountain path between two summits, in Fig. 4 
located on the top and at the bottom of the drawing.) A particle of energy E may move classically in the 
left and right regions with U(x, y) < E, but can pass from one of these regions to another one only via the 
quantum-mechanical tunneling under the pass. Let us calculate the transparency of this tunnel barrier in 
the WKB approximation, ignoring the possible pre-exponential factor. 




Fig. 5.4. Saddle-type 2D potential 
profile and the instanton trajectory of 
a particle of energy E (dashed line, 
schematically). 



According to the evident multi-dimensional generalization Eq. (86), for the classically forbidden 
region, where E < U{x, y), the contributions to propagator (87) are proportional to 

expj- J K(r) • dr J expj- l - E(t - 1 0 ) J , (5.88) 

where the magnitude of vector k at each point may be calculated just in the ID case - see, e.g., Eq. 
(2.97), 

^P^ = U(r)-E, (5.89) 
2m 

while its direction should be tangential to the path trajectory in space. Now the path integral is actually 
much simpler than in the classically-allowed region, because the spatial exponents are purely real and 
there is no complex interference between them. Because of the minus sign in the exponent, the largest 
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3D 

tunneling 
in WKB 
limit 



contribution to G evidently comes from the trajectory (or rather a narrow bundle of trajectories) for 
which the functional 

r 

Jic(r') ■</!■' (5.90) 
has the smallest value, and the barrier transmission coefficient may be calculated as 

(5.91) 




where r and ro are certain points on the opposite classical turning-point surfaces: U(r) = U(ro) = E. 26 

Thus the tunneling problem is reduced to finding the trajectory (including points r and ro) that 
connects the two surfaces and minimizes functional (90). This is of course a well-known problem of the 
calculus of variations, 27 but it is interesting that the path integral provides a simple alternative way of 
solving it. Let us consider an auxiliary problem of particle's motion in a potential profile £/i nv (r) that is 
inverted relative to particle's energy E, i.e. is defined by the following equality: 

U mv (r)-E = E-U(r). (5.92) 

As was discussed above, at fixed energy E, the path integral for the WKB motion in the classically 
allowed region of potential U{ ny {x,y) (that coincides with the classically forbidden region of the original 
problem) is dominated by the classical trajectory corresponding to the minimum of 

S im =]p im (r').dr' = h]k im (r').dr, (5.93) 

where k; nv should be determined from the relation 

toU e£-yjr). (5.94) 
2m 

But comparing Eqs. (89), (92), and (94), we see that k; nv = k at each point of space! This means that the 
tunneling path (in the WKB limit) corresponds to the classical (so-called instanton) 28 trajectory of the 
same particle in the inverted potential L/j nv (r). If the initial point ro is fixed, this trajectory may be 
readily found by the means of classical mechanics. (Note that the initial velocity of the instanton 
launched from point ro should be zero, because by the classical turning point definition: Umviyo) = U(tq) 
= E.) Thus the problem is reduced to a simpler task of maximizing the transparency (91) over the 
position of ro on the equipotential surface U(ro) = E. Moreover, for many symmetric potentials, the 
position of this point may be readily guessed without calculations. (For an example, see the exercise 
problem list in the end of the chapter.) 



26 One can argue that the pre-exponential coefficient in Eq. (91) should be close to 1, just like in Eq. (2.117), 
especially if the potential is smooth in the sense of Eq. (2.107), where x is the coordinate along the trajectory. 

27 For a concise introduction to the field see, e.g., I. Gelfand and S. Fomin, Calculus of Variations, Dover, 2000, 
or L. Elsgolc, Calculus of Variations, Dover, 2007. 

28 In quantum field theory, the instanton concept may be formulated somewhat differently, and has more complex 
applications - see, e.g. R. Rajaraman, Solitons and Instantons , North Holland, 1987. 
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5.4. Revisiting harmonic oscillator 

Let us return to the ID harmonic oscillator, i.e. any system described by Hamiltonian (2.50) with 
potential energy (2.111): 



2-2 



ft = P 2 , mCt) o x 
2m 2 



(5.95) 



Harmonic 
oscillator: 
Hamiltonian 



In Sec. 2.10 we have used the "brute-force" (wave-mechanics) approach to analyze the eigenfunctions 
y/„(x) and eigenvalues E n of this Hamiltonian, and found that, unfortunately, that approach required 
relatively complex math that obscures the physics of these stationary ("Fock") states. Now let us use the 
bra-ket formalism to make the properties of these states much more transparent, using very simple 
calculations. 



First, introducing normalized (dimensionless) operators of coordinates and momentum: 29 



X 



Xn 



mco 0 x 0 



(5.96) 



1 10 

where xo = (fr/mcoo) is the natural coordinate scale (the r.m.s. spread of ground-state wavefunction) 
which was discussed in detail in Sec. 2.10, we can present Hamiltonian (95) in a very simple andx <-> p 
symmetric form: 



H = 



ft(L>r> 



(i 2 + i 2 ). 



Now, let us introduce a new operator 




(5.97) 



(5.98a) 



Since both operators £, and Q correspond to real observables, i.e. have real eigenvalues and hence are 
Hermitian (self-adjoint), the Hermitian conjugate of operator a is simply its complex conjugate: 

(5.98b) 

Solving the system of two equations (98) for <f and £ , we may readily get reciprocal relations 

^H f ) ^M^} (5 - 99> 



Creation- 
annihilation 
operators: 
definition 




Our Hamiltonian (97) includes squares of these operators. Calculating them, we have to be careful to 
avoid swapping the new operators, because they do not commute. Indeed, for the normalized operators 
(96), Eq. (2.14) gives 



29 This normalization is not really necessary, it just makes the following calculations less bulky - and thus more 
aesthetically appealing. 
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x Q mcc> 0 



[x,p] = il, 



Creation- 
annihilation 
operators: 
commutation 
relation 



so that Eqs. (98) yield 



a, a 1 



=|[(l+ 4 (I - *<?)] = - { {i £]- l£ 1} = * ■ 



With such due caution, Eq. (99) gives 



1 



a +a' +aa ] +a ] a 



a +a' — aa ] — a ] a 



(5.100) 



(5.101) 



(5.102) 



Plugging these expressions back into Eq. (97), we get 



// = aa + a a 



(5.103) 



This expression is elegant enough, but may be recast into an even more convenient form. For 
that, let us rewrite the commutation relation (100) as 



a a T T ,«, j- 

aa ] = a ' a + 1 



and plug it into Eq. (103). The result is 



Hamiltonian 
and number 
operators 



r « 1 ^ 
N + -I 



V 



J 



where, in the last form, one more (evidently, Hermitian) operator, 



N = a' a , 



(5.104) 



(5.105) 



(5.106) 



has been introduced. Since, according to Eq. (105), operators H and N differ only by the addition of 
an identity operator and the multiplication by a c-number, these operators commute. Hence, according to 
the general arguments of Sec. 4.5, they share the set of stationary (Fock) eigenstates n, and we can write 
the eigenproblem for the new operator as 

N\n) = N,\n), (5.107) 

where N„ are some eigenvalues that, according to Eq. (105), determine also the energy spectrum of the 
oscillator: 



E. = fico c 



N. + 



1 



(5.108) 



So far, we know only that all eigenvalues N„ are real, but not much more. In order to calculate 
them, let us carry out the following calculation - splendid in its simplicity and efficiency. Consider the 

result of action of operator TV on the ket-vector a '[«). Using the definition (106) and the associative 
rule, we may write 



N\ a*\n 



a' a a'|n\| = a'\ aa^ \\n 



(5.109) 
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Now using the commutation relation (104), and then Eq. (107), we may continue as 

a t ^aa t ^\n) = a t ^a t a + I^\n) = a t (N + i)\n) = a t (N n +\)n) = (N n + l)(V|n)J. (5.110) 

Let us summarize the result of this calculation: 

N^\n)^ = (N n +l^\n)j (5.111) 

Performing an absolutely similar calculation with operator a , we can also get another formula: 

N(a\n))={N n -iia\n)). (5.112) 

It is time to stop calculations and translate these results into plain English: if \ri) is an eigenket of 

operator N with eigenvalue N„, then a \n) and a \n) are also eigenkets of that operator, with 
eigenvalues {N n +1), and (N„ - 1), respectively. This statement may be presented with the ladder 
diagram shown in Fig. 5. 



eigenket ... eigenvalue of TV 




Fig. 5.5. Hierarchy (the "ladder diagram") of eigenstates 
of a ID harmonic oscillator. Arrows show the actions of 
the creation and annihilation operators on the 
eigenstates. 



Operator a " moves the system a step up the ladder, while operator a brings it one step down. In 
other words, the former operator creates a new excitation of the system, 30 while the latter operator kills 
("annihilates") such excitation. This is why a ' is called the creation operator, and a , the annihilation 
operator. In its turn, according to Eq. (107), operator N does not change the state of the system, but 
"counts" its position on the ladder: 

(n\N\n) = (n\N n \n) = N n . (5.113) 

This is why N is called the number operator, in our current context meaning the number of the 
elementary excitations of the oscillator. 

This calculation still needs a completion. Indeed, we still do not know whether the ladder shown 
in Fig. 5 shows all eigenstates of the oscillator, and what exactly the numbers N„ are. Fascinating 
enough, both questions may be answered by exploring a single paradox. Let us start with some state 
(step of the ladder), and keep going down it, applying operator a again and again. Each time, 
eigenvalue N„ is decreased by one, so that eventually it should become negative. However, this cannot 



30 For the electromagnetic field oscillators, such excitations are called photons; for the mechanical wave field 
oscillators, phonons, etc. 
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happen, because any real eigenstate, including the states presented by kets \d) =a\n) and \n), should 
have a positive norm - see Eq. (4.16). Comparing the norms, 

\\n\\ 2 = (n\n), \\d\\ 2 = (n\a^a\n) = (n\N\n) = N (n\n), (5.114) 



we see that the both of them cannot be positive simultaneously if N„ is negative. 

The way toward the resolution of this paradox is to notice that the action of the creation and 
annihilation operators on the stationary states may consist in not only their promotion to the next step of 
the ladder diagram, but also by their multiplication by some c-numbers: 

a\n) = A\n-\), a* In) = A' \n + l). (5.115) 



(Linear relations (111) and (112) clearly allow that.) Let us calculate coefficients A n assuming, for 
convenience, that all eigenstates, including states n and {n -1), are normalized: 

n\n) = \, (n-\\n-\) = (n\\ — |n) = = -^-(n|n) = 1 . (5.116) 

A A„ 



* \ i i / * 
A A A n A n 



1/9 

From here, we get \A„ \ = (N„) , i.e. 

a\n) = N)! 2 e i<p »\n-\), (5.117) 

where <p„ is an arbitrary real phase. Now let us consider what happens if all numbers N„ are integers. 
(Because of the definition of N„, given by Eq. (107), it is convenient to call these integers n, i.e. by the 
same letter as the corresponding eigenstate.) Then when we have come down to state with n = 0, an 
attempt to make one more step down gives 

a|0) = 0|-l). (5.118) 

But in accordance with Eq. (4.9), the state in the right-hand part of this equation is the "null-state", i.e. 
does not exist. 31 This gives the (only known :-) resolution of the state ladder paradox: the ladder has the 
lowest step with N n = n = 0. 

As a by-product of our discussion, we have obtained a very important relation N n = n, which 
means, in particular, that the state ladder includes all eigenstates of the oscillator. Plugging this relation 
into Eq. (108), we see that the full spectrum of eigenenergies of the harmonic oscillator is described by 
the simple formula 



« + -], « = 0,1,2..., (5.119) 

V 2 J 



which was already discussed in Sec. 2.10. It is rather remarkable that the bra-ket formalism has allowed 
us to derive it without calculation of the corresponding (rather cumbersome) wavefunctions y/ n {x) - see 
Eqs. (2.279). 



31 Please note again the radical difference between the null-state in the right-hand part of Eq. (118) and the state 
described by ket- vector |0) in the left-hand side of that relation. The latter state does exist and, moreover, presents 
the most important, ground state of the system, with n = 0 - see Eq. (2.269). 
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Moreover, the formalism may be also used to calculate virtually any bra-ket pertaining to the 
oscillator, without using y/ n (x). In order to illustrate that, let us first calculate A '„ participating in the 
latter of relations (115). This can be done absolutely similarly to the above calculation of A„; otherwise, 

1/2 1/2 

since we already know that \A n \ = (N„) = n , we may notice that according to Eqs. (106) and (115), 
the eigenproblem (107), that in our new notation for N» becomes 

N\n) = n\n 

may be rewritten as 

n\ n) = a\ nj = a* A n | n - 1) = A n A n _ x | n 

1/2 

Comparing the first and the last form of this equality, we see that \A ' n .\\ = nl\A n \ = n , i.e. A '„ = (n + 

1/2 

1) exp(z'^„'). Taking all phases <p„ and <p n ' equal to zero for simplicity, we may reduce Eqs. (115) to 
their final, standard form 32 



(5.120) 



(5.121) 



n) = (n + if' 2 \n + 1^, a\n) = n ll2 \n -1 



Up and 

p.izz; Fock state 
ladder 



Now we can use these formulas to calculate, for example, the matrix elements of operators in 
the Fock state basis: 



x, 



7/'|.V|7/) = X 0 (rt'|^|»^ = -^=(/?'|| ci + a || n 



At 



4i 



n \a\n) + (n \a \n 



,'Ut| 



(5.123) 



To complete the calculation, we may now use Eqs. (122) and the Fock state orthonormality: 

(n'\n) = S n , n . (5.124) 

The result is 



n^n) = ^{n^S n ,„_ x Hn + irS n ,J-- 



r t \ 112 
n 



K 2mco 0 j 



Coordinate's 



(5.125) matrix 

elements 



Acting absolutely similarly, for the momentum bra-kets we get a similar expression: 



n \p\n) = i 



nmco, N 



v 



{-n V2 S n , n _ 1+ (n + l) m S n , n+l ) 



(5.126) 



Hence the matrices of both operators in the Fock-state basis have only two diagonals, adjacent to the 
main diagonal; all other elements (including the diagonal ones) are zeros. 

Matrix elements of higher powers of these operators, as well as their products, may be handled 
similarly, though the higher is the power, the bulkier is the result. For example, 

oo 

i'|i 2 |n) = (V|xx|n) = ^ <V |x| |x| 

«"=0 



32 A useful mnemonic rule is that the c-number coefficient in any of these relations is equal to the square root of 
the largest number of the two states it relates. 



Chapter 5 



Page 24 of 46 



Essential Graduate Physics 



QM: Quantum Mechanics 



= ^-±{(n") U2 S n ,^ Hn" + lf 2 8 n ^\n XI2 8 n , n _ x + (n + l) l,2 S n , >+1 ) 



2 „"=o 



(5.127) 



= ^ {n{n- 1)] 1/2 S n , n _ 2 +[(n + \)(n + 2)] V2 S n , n+2 + (2n + l)S n> 



For applications, the most important of these matrix elements are those on its main diagonal: 



x 2 ) = (n\x 2 \n) = ^-(2n + l). 



(5.128) 



This expression shows, in particular, that the expectation value of oscillator's potential energy in n-th 
Fock state is 



P)- 



mco a 



0 / 2 



hco n 



x = 



n + - 



1 



(5.129) 



This is exactly Vi of the total energy (119) of the oscillator. As a sanity check, an absolutely similar 
calculation of the kinetic energy shows that 



^ 2 \ 1 / h*l \ 
— / = \ n \P \ n ) = 



tlCOr, 



2m 2m 



n + - 



1 



(5.130) 



i.e. both partial energies equal EJ2, just as in a classical oscillator. 33 



5.5. The Glauber and squeezed states 

There is evidently a huge difference between a quantum stationary (Fock) state of the oscillator 
and its classical state. Indeed, let us write the classical Hamilton equations of motion of the oscillator 
(using capital letters to distinguish the classical variables from arguments of quantum wavefunctions): 



X = 



P = 



m 



8U 

dx 



= -mco 0 X. 



(5.131) 



On the "phase plane" with Cartesian coordinates x and p (Fig. 6), these equations describe clockwise 
rotation of the representation point {X(f), P(t)} along an elliptic trajectory starting from the initial point 
{X(0), -P(O)}. (The normalization of momentum by ma>o, similar to the one performed by the second of 
Eqs. (96), makes the trajectory pleasingly circular, with a constant radius equal to oscillation's 
amplitude A, reflecting the constant full energy 



A 2 = [X(t)f + 



P(t) 



mco n 



= const = [X(0)f 



+ 



P(0) 



mco n 



(5.132) 



determined by the initial conditions.) 



For the forthcoming comparison with quantum states, it is convenient to describe this classical 
solution by the following dimensionless complex variable 



33 Still note that operators of the partial (potential and kinetic) energies do not commute with either each other or 
with the full-energy (Hamiltonian) operator, so that the Fock states n are not their eigenstates. 
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a(t) = 



4ly 



X(t) + i 



.. P(t) 



mco. 



(5.133) 



by 



which is essentially the standard complex-number representation of system's position on the 2D phase 
plane, with \a\ =aN1xq. With this definition, Eqs. (131) are conveniently merged into one equation, 



with an evident, very simple solution 



a = -ico 0 a, 



a(t) = a(0)e ^ , 



(5.134) 



(5.135) 



where the constant «(0) may be complex, and is just the (normalized) classical complex amplitude of 
oscillations. 34 This equation describes sinusoidal oscillations of bothX(0 <x Re[«(0] and/ 5 <x Im[«(0L 
with a phase shift of nil between them. 



pi mco { 




Fig. 5.6. Schematic representation of various states of a 
harmonic oscillator on the phase plane. The bold black 
point represents a classical state, with the dashed line 
showing its trajectory. (Very imperfect) classical images 
of the Fock states with n = 0, 1, and 2 are shown in blue, 
while the blurred red spot is the (equally schematic) 
Glauber state's image. Finally, the magenta elliptical 
spot is a classical image of a squeezed ground state. 
Arrows show the direction of states' evolution in time. 



On the other hand, according to the basic Eqs. (4.157)-(4.158), the time dependence of a Fock 
state, as of a stationary state of the oscillator, is limited to the phase factor exp{-iE n t/fi} not in 
observables, but rather in the wavefunction, and a result, gives time-independent expectation values of x, 
p, or of any function thereof. (Moreover, as Eqs. (125) and (126) show, (x) = (p) = 0.) Taking into 
account Eqs. (129) and (130), the closest (though very imperfect) geometric image 35 for such a state on 
the phase plane is a blurred circle of radius A n = xo(2n + 1) 1/2 , along which the wavefunction is 
uniformly spread as a wave - see the blue rings in Fig. 6. For the ground state (n = 0), with 
wavefunction (2.269), a better image is a blurred round spot, of radius ~xo, at the origin. 



34 See, e.g., CM Chapter 4, especially Eqs. (4.4) and Fig. 4.9 and its discussion. 

35 I have to confess that such geometric mapping of a quantum state on the phase plane [x, p] is not exactly 
defined; you may think about colored areas in Fig. 6 as regions of pairs {x, p} most probably obtained in 
measurements. A quantitative definition of such a mapping will be given in Sec. 7.3 using the Wigner function, 
though, as we will see, even such imaging definition has certain internal contradictions. Still such cartoons may 
have considerable cognitive/heuristic value, if their limitations are kept in mind. 
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However, the Fock states n are not the only possible quantum states of the oscillator: according 
to the basic Eq. (4.6), a state described by ket-vector 



a 



= 2X1 



(5.136) 



n=0 



with any set of (complex) c-numbers a n , is also its legitimate state, subject only to the normalization 
condition (a\a) = 1, giving 



ZKI 2 =i- 



(5.137) 



n=0 



It is natural to ask: can we select coefficients a„ in such a special way that the state properties 
would be closer to the classical ones; in particular the expectation values (x) and (p) of coordinate and 
momentum would evolve in time just as the classical values X(t) and P(t), while the uncertainties of 
these observables would be time-independent and the same as in the ground state: 



Sx = ^ 



.1/2 



2m co, 



dp = 



4i 



hmoo 0 ^ 



1/2 



(5.138) 



Glauber 
state in 
coordinate 
representation 



with the smallest possible value of the uncertainty product, Sxdp = h/2. 36 Let me show that such a 
Glauber state, 31 which is schematically represented in Fig. 6 by a blurred red spot around the classical 
point {X(f), P(t)}, is indeed possible. 

Conceptually the simplest way to find the corresponding coefficients a„ would be to calculate 
<x), (p), Sx and dp for an arbitrary set of a n , and then try to optimize these coefficients to reach our goal. 
However, this problem may be solved much easier using wave mechanics. Indeed, let us consider the 
following wavefunction 



(5.139) 




Its comparison with Eqs. (2.16) and (2.269) shows that this is just a Gaussian wave packet with the 
average momentum P and the coordinate width Sx given by Eq. (138), but shifted along axis x by X. 
Hence, this wavefunction satisfies all the above requirements, and a straightforward (though a bit bulky) 
differentiation over x and t shows it also satisfies oscillator's Schrodinger equation, provided that that 
functions X{t) andP(0 satisfy classical Eqs. (131). 

This fact is true even for a more general situation when the oscillator, initially in its ground 
state 38 comes under effect of a classical force F(t). (Evidently, for its description its is sufficient to add 
this function to the right-hand part of the second of Eqs. (131).) Moreover, the electromagnetic radiation 



36 In the quantum theory of measurements, Eqs. (138) are frequently referred to as the standard quantum limit. 

37 Named after R. J. Glauber who studied these states in detail in 1965, though they had been discussed in brief by 
E. Schrodinger as early as in 1926. Another popular name, "coherent", for the Glauber states is very misleading, 
because all the quantum states we have studied so far (including the Fock states) may be presented as coherent 
(pure) superpositions of the basis states. 

38 As will be discussed in Chapter 7, the ground state may be readily formed, for example, by providing a weak 
coupling of the oscillator to a low-temperature (k B T« ha»o) environment. 
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formed in "good" (single-mode) lasers is also in the Glauber state. (As will be discussed in Chapter 9, 
the experimental formation of Fock states n, with the only exception of n = 0, i.e. the ground state, is 
much harder.) This is why the Glauber states are so important. 

Though Eq. (139) gives the full wave-mechanics description of a Glauber state, there is a 
substantial place for the bra-ket formalism here too. For example, in order to calculate the corresponding 
coefficients in expansion (136), 



n\a 



■ | dx(n | 



x){x\a 



JVrO) V a (x)dx, 



(5.140) 



we would need to use not only the simple Eq. (139), but also the Fock state wavefunctions y/ n (x), which 
are not very appealing - see Eq. (2.279). Instead, this calculation may be readily done in the bra-ket 
formalism, giving us one important byproduct result. 

Let us start from expressing the double shift of the ground state (by X and P), that has led us to 
Eq. (139), in the operator language. Forgetting about the P for a minute, let us find a translation 

operator T x that produces the desirable shift of coordinate by Xof an arbitrary wavefunction \fAx) - say 
represented as the standard wave packet (5 9). 39 Evidently, the result of its action, in the coordinate 
representation, is 



T x yr(x) = yr(x - X) 



:P {x-X) 



h 



-dp . 



(5.141) 



Hence, the shift may be achieved by the multiplication of each Fourier component of the packet, with 
momentum p, by exp{-ipX/n}. This gives us a hint that the general form of the translation operator, valid 
in any representation, should be 




, i A r)\ ^-translation 
(j.14z) operator 



The proof of this formula is provided by the fact that any operator is uniquely determined by the set of 
its matrix elements in any full and orthogonal basis, in particular the basis of momentum states p. 
According to Eq. (141), the analog of Eq. (23) for the /^-representation, applied to the translation 
operator (which is evidently local), is 



P \? x I P '}<P(P r ) = 8(p-P f ) expj - i \<p(p) 



(5.143) 



so that operator (142) does exactly the job we need it to. 

The operator that provides the shift of momentum by P is absolutely similar - with the opposite 
sign under the exponent, due to the opposite sign of the exponent in the reciprocal Fourier transform, so 
that the simultaneous shift by both X and P may be achieved by the following translation operator: 




a-translation 
(5.144) operator 



39 Cf. Exercise 4.11. 
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As we already know, for a harmonic oscillator the creation-annihilation operators are more natural, so 
that we may use Eqs. (96) and (99) to recast Eq. (144) as 



f a =exp|aa^-« aj, with 7~J = expj 



a a -aa' 



(5.145) 



where the c-number a (generally, a function of time) is defined by Eq. (133). Now, according to Eq. 

Glauber (139), we may form the Glauber state's ket-vector just as 
state as 



ground 
state's 
translation 



a) = T\0 



(5.146) 



This formula looks nice and simple, but making practical calculations (say, calculating 
expectation values of variables) with the translation operator (144) is not too easy because of its 
exponent-of-operators form. Fortunately, it turns out that a much simpler representation for the Glauber 
state is possible. To show than, let us start with the following general (and very useful) property of 
exponential functions of operators: if 



A,B 



jUl, 



(where A and B are arbitrary operators, and ju is a c-number), then 40 

exp{+ i}i?expj- a]= B + /J. 
Let us apply Eqs. (147)-(148) to two cases, both with 

A = a*a-aa\ so that exp|+ a}= Tj, expj- A^=T a . 

First, let us take B = I , then Eq. (147) is valid with ju = 0, and Eq. (148) yields 



(5.147) 



(5.148) 



(5.149) 



(5.150) 



This equality means that the translation operator is unitary - not a big surprise, because if we shift a 
classical point on the complex phase plane by (+«) and then by (-«), we certainly must come back to the 
initial position. Relation (150) means merely that this fact is also true for any quantum state as well. 

Second, let us take B = a; in order to verify Eq. (147) and find the corresponding ju, let us 
calculate the commutator. Using, at the due stage of calculation, Eq. (104), we get 



A,B 




a a - aa,a 


= -a 


a , a 













cd, 



(5.151) 



so that in this case /u= a, and Eq. (148) yields 

fjaf a =a + ai. (5.152) 
We have approached the summit of this beautiful calculation. Let us consider operator 



40 The proof of Eq. (148) may be readily achieved by expanding operator f(A) = exp|+ AAjB &xp\- AAj in the 
Taylor series in the c-number parameter A, and then evaluating the result at A = 1 . 
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ffjaf a . (5.153) 

Using Eq. (150), we may reduce this expression to aT a , while the application of Eq. (151) to the same 
expression yields T a a + at a . Hence, we get the following operator equality 

af a =f a a + af a , (5.154) 

which may be applied to any state. Now acting by these two operators on the ground state |0) and using 
the facts that a |0) is the null-state, while T a 1 0) = I a), we finally get a very simple and elegant result: 41 





a 


a) = a 


a) . 



Glauber 



(5.155) stateas 

operator a's 



Thus any Glauber state is just one of eigenstates of the annihilation operator, namely the one 
with the eigenvalue equal to parameter a, i.e. to the complex representation (133) of the classical state 
which is the center of the Glauber state's distribution. 42 This fact makes the calculations of the Glauber 
state properties much simpler. As the simplest example, let us use Eq. (155) to find <x) in the Glauber 
state: 



eigenstate 



x) = (a \x \a) = 



4i 



a II a + a** \\a ) - 



4i 



a \a\a ) -x a'\a 



(5.156) 



In the first term in the parentheses, we can apply Eq. (155) directly, while in the second term, we can 
use the bra-counterpart of that relation, (a\a ] = (a \a . Now assuming that the Glauber state is 
normalized, (a\a) = 1, and using Eq. (133), we get 



4i 



a\a\a) + (a\a \a 



-p= [a + a* )- 

V2 v ' 



X 



(5.157) 



Acting absolutely similarly, we may readily extend this sanity check to verify that (p) = P, and that dx 
and dp indeed obey Eq. (138). 

As a more thorough sanity check, let us use Eq. (155) to re-calculate Glauber state's 
wavefunction (139). Inner-multiplying both sides of that relation by bra-vector (x\, and using definition 
(98a) of the annihilation operator, we get 



1 



4i> 



X 



x + i- 



0 V 



ma>, 



a) = a(x\a 



(5.158) 



o J 



41 It is also rather counter-intuitive. Indeed, according to Eq. (122), the annihilation operator a , acting on a Fock 
state n, "beats it down" to the lower-energy state (n - 1) - see Eq. (119). However, according to Eq. (155), its 
action on a Glauber state a does not lead to the state change and hence to an energy decrease! The resolution of 
this paradox may be achieved via representation of the Glauber state as a series of Fock states - see Eq. ( 1 65) 
below. Operator a indeed transfers each Fock component to a lower-energy state, but it also re-weighs each term 
of the expansion, so that the complete energy of the Glauber state remains constant. 

42 Note that the spectrum of eigenvalues a of eigenproblem (155) is continuous - it may be any complex number! 
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Since (x| is the bra-vector of the eigenstate of the Hermitian operator x , they may be swapped, with the 
operator giving its eigenvalue x; acting on that bra-vector by the (local!) operator of momentum, we 
have to use it in the coordinate representation (63). As a result, we get 



1 



i d 

x(x\a) + 



ma> 0 dx 











= a(x\aj . 


(5.159) 


J 







But (x\a) is nothing else than the Glauber state's wavefunction ¥ a , so that Eq. (153) gives for it a first- 
order differential equation 



1 



V2x n 



IT. ^ 3 ... 

mco Q ox 



(5.160) 



Chasing and x to the opposite sides of the equation, and using definition (133) of parameter a, we 
may bring this equation to a form 



8^ 



mco n 



n 



■x + 



X + i- 



ma>, 



o J 



dx . 



(5.161) 



Integrating both parts, we return to Eq. (139) that had been derived by wave-mechanics means. 

Now that we can use Eq. (155) for finding coefficients a„ in the expansion (136) of the Glauber 
state a in series over the Fock states n. Plugging Eq. (136) into each side of Eq. (155), using the first of 
Eq. (122) in the left-hand part, and requiring the coefficients at each ket-vector \n) in both parts to be 
equal, we get the following recurrent relation for the coefficients: 



a., 



a 



,l'2 a n 



(n + iy 

Assuming some value of «o, and applying the relation sequentially for n = 1,2, etc., we get 

a" 



(5.162) 



a,. = 



(nl) 



1/2 a 0 



(5.163) 



Now we can find Oo from the normalization requirement (137), getting 

I |2fl 



a 



\a 



n=0 



= 1. 



(5.164) 



In this sum, we may readily recognize the Taylor expansion of function exp{|«| }, so that the final 
result (besides an arbitrary common phase multiplier) is 



Glauber 
state as 
Fock states' 
superposition 





f 1 |2' 




\a) = exp< 


\a\ 

~Y 


°° a" 
„=o («!) 



(5.165) 



It means in particular that the probability W„ = cc n cc„* of finding the system energy on n-th 
energy level (119) obeys the well-known Poisson distribution (Fig. 7): 
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Fig. 5.7. The Poisson distribution for 
several values of («). Note that W„ are 
defined only for integer values of n; lines 
are only guides for the eye. 




where in our particular case 

in) = \a\ 2 . 

For applications, perhaps the most important mathematical property of this distribution is 



Sn = (n 



1/2 



ft- Poisson 
p. lOO) distribution 



(5.167) 



(5.168) fluctuation 



note also that at (nj » 1 , and hence 5n « (nj , the Poisson distribution approaches the Gaussian 
("normal") one. 

Now let us discuss the evolution of the Glauber state in time. In the Schrodinger language, it is 
completely described by dynamics (131) of the c-number shifts X(i) and P(t) participating in 
wavefunction (139). Note again that, in contrast to the spread of the wave packet of a free particle, 
discussed in Sec. 2.2, in the harmonic oscillator the Gaussian packet of special width (138) does not 
spread at all! 

An alternative and equivalent way of dynamics description is to use the Heisenberg equation of 
motion. As Eqs. (42) and (48) tell us, such equations for Heisenberg operators of coordinate and 
momentum they have to be similar to the classical equation (131): 



Ph 



in 



Ph 



-ma>lx n . 



(5.169) 



Now using Eqs. (98), for the Heisenberg-picture creation and annihilation operators we get equations 



it 



+ico 0 a H , 



(5.170) 



that are completely similar for the classical equation (134) for the c-number parameter a and its 
complex conjugate, and hence have the solutions identical to Eq. (135): 



d H (t) = a H (0)e m ° , dl(t) = al(0)e lC ° 0 



(5.171) 
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As was discussed in Sec. 4.6, such equations are very convenient because they enable simple 
calculation of time evolution of observables for any initial state of the oscillator (Fock, Glauber, or any 
other) using Eq. (4.191). Applied to a Glauber state «(0), such calculation gives the same results as have 
already been derived earlier in this section, in particular confirms that the Gaussian wave packet of the 
special width (138) does not spread in time. 

Now let us consider what happens if the initial wave packet is still Gaussian, but has a different 
width, say dx < xoNl. As we already know from Sec. 2.2, the momentum spread dp will be 
correspondingly larger, still with the smallest uncertainty product: dxdp = fill. Such squeezed ground 
state ^ with zero expectation values of x and p, may be generated from the Fock/Glauber ground state: 



Squeezed 
ground 
state 



using the so-called squeezing operator, 



Squeezing 
operator 



5, 



If „|~t 
e.xp< — | * act - m a 



(5.172a) 



(5.172b) 



which depends on a complex c-number parameter * = re' 6 . Parameter's modulus r determines the 
squeezing degree; it is straightforward to use Eq. (172) for checking that if a- is real {6= 0, £, = r), then 



,1/2 



dx = ^=e 
V2 



2m co, 



dp 



mco 0 x 0 r 



o J 



hmco 0 ^ 



1/2 



e' , so that dxdp 



(5.173) 



On the phase plane (Fig. 6), this state, with r > 0, may be represented by an oval spot squeezed 
along axis x (hence the state's name) and stretched along axis p; the same formulas but with r < 0 
describe the opposite squeezing. On the other hand, phase 9 of the squeeze parameter ^ determines the 

angle 6 12 of oval's turn about the phase plane origin - see the magenta ellipse in Fig. 6; if 6^ 0, Eqs. 
(173) are valid for variables {x',p'} obtained from {x, p] via clockwise rotation by that angle. For any 
of such origin-centered squeezed states, time evolution is reduced to an increase of the angle with rate 
coo, i.e. to the clockwise rotation of the ellipse, without its deformation, with angular velocity a>o - see 
the magenta arrows in Fig. 6. As a result, uncertainties dx and dp oscillate in time with double frequency 
2a>o, while their product is constant at its minimal possible value hl2. 

Such squeezed ground states have important implications for quantum measurements (see Sec. 
7.7 below) and may be formed, for example, by parametric excitation of the oscillator, 43 with a 
parameter modulation depth close to, but still below the threshold of parametric oscillations excitation. 
Unfortunately, I do have time for a further discussion of this interesting topic, 44 but still need to mention 
a more general class of squeezed states, centered to an arbitrary point {X, P} rather than the origin, that 
may be formed by an additional action of the displacement operator (144) on the squeezed ground state 
(172). Calculations similar to those that led us from Eq. (145) to Eq. (155), but now for the product 



43 For a discussion and classical theory of this effect, see, e.g., CM Sec. 4.5. 

44 See, e.g., Chapter 7 in C. Gerry and P. Knight, Introductory Quantum Optics, Cambridge U. Press, 2005. 1 also 
invite the reader to have a look at the spectacular results of experimental measurements of the Glauber and 
squeezed states of electromagnetic (light) oscillators by G. Breitenbach et ah, Nature 387, 471 (1997), and the 
very large, ten-fold squeezing in such oscillators by H. Vahlbruch et ah, Phys. Rev. Lett. 100, 033602 (2008). 
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operator T a S s rater than bare T a , show that such a general squeezed state is an eigenstate of the 
following mixed operator 

b = a coshr + a* e l ® sinhr , (5.174a) 

with eigenvalue 

= acoshr + a e sinhr. (5.174b) 

For the particular case a= 0, Eq. (174b) yields J3 = 0, i.e. the action of operator (174a) on the squeezed 
ground state ^ with the same r and 0 yields the null-state, thus generalizing Eq. (118), which is valid for 
the "usual" (non-squeezed) ground state. 



5.6. Revisiting spherically-symmetric problems 

One more blank spot to fill has been left in our study of wave mechanics of spheric ally-3D 
symmetric systems in Sec. 3.6. Indeed, while the eigenfunctions describing axially-symmetric 2D 
systems, and the azimuthal components of those in spherically-symmetric 3D systems, are very simple, 



¥„ 



1 



(2*r 



im(p 



m = 0, ±1, ±2, 



(5.175) 



the polar components of the eigenfunctions in the latter case (i.e., of spherical harmonics) include the 
associate Legendre functions Pi m (cos0) that may be expressed via elementary functions only indirectly 
- see Eqs. (3.165) and (3.168). This makes all the calculations less than transparent and, in particular, 
does not allow a clear insight into the origin of the very simple eigenvalue spectrum - see, e.g., Eq. 
(3.163). The bra-ket formalism, applied to the angular momentum operator, allows one to get such 
insight, and also produces a very convenient tool for many calculations involving spherically-symmetric 
potentials. 

Let us start from using the correspondence principle to spell out the quantum-mechanical 
operator of the orbital angular momentum L = rxp of a point particle: 













L = r xp = 


X 


y 


z 


, i.e. L x =yp z -zp y , etc., 




Px 


Py 


Pz 





Angular 

(5.176) momentum 
operator 



From this definition, we can readily calculate the commutation relations for all Cartesian components of 
operators L,f , andp ; for example, 



. x >y\=[yPz - z P y ,y\=- z [Pyy\ =lflz ' 



(5.177) 



etc. Using the sequential numbering of coordinate axes (x = r\, etc.), the summary of these calculations 
may be presented in similar, compact (and beautiful!) forms: 





















= ihr f £..,.„, 


Lj'Pf. 


= ihp r s Jjr , 




= ihL f £..,.„, 



Main 



(5.178) commutation 
relations 
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where each of indices and j" may take any of values 1, 2, and 3, and <%y- is the Levi-Civita symbol 



2 I 12 

(or "permutation symbol"). 45 Also introducing in the natural way a (scalar!) operator of L = I L I , 



L 2 =L 2 X +L 2 +L 2 Z , 



(5.179) 



° pe ^ l °2 it is straightforward to check that this operator commutes with each of the Cartesian components: 



L 2 , L, 



0. 



(5.180) 



This result, at the first sight, may seem to contradict the last of Eqs. (178). Indeed, haven't we 

learned in Sec. 4.5 that commuting operators (e.g., l and any of Lj ) share their eigenstate sets? If yes, 

shouldn't that mean that this set has to be common for all 4 operators? The resolution in this paradox 
may be found in the condition that was mentioned just after Eq. (4.138), but (sorry!) not sufficiently 
emphasized there. According to that relation, if an operator has degenerate eigenstates (i.e. if Aj = Ay 
even for j ^ j '), they should not be necessarily shared by another compatible operator. This is exactly the 
situation with the orbital angular momentum operators, that may be schematically shown at a Venn 

diagram (Fig. 8): 46 the set of eigenstates of operator 1} is highly degenerate, 47 and is broader than those 
of the component operators Lj (that, as will be shown below, are non-degenerate until we consider 
particle's spin). 




Fig. 5.8. Venn diagram showing (schematically) the 

partitioning of the set of eigenstates of operator L 2 . Each 
inner sector corresponds to the states shared with one of 

Cartesian component operators Lj , while the outer 

(shaded) ring presents the eigenstates of L 2 that are not 
shared with either of L , - e.g., all linear combinations of 
eigenstates of different component operators. 



Let us focus on just one of these 3 joint sets of eigenstates - by tradition, of operators L and L z . 
(This tradition is due to the canonical form of spherical coordinates, in which the polar angle is 
measured from axis z. Indeed, using Eqs. (63), in the coordinate representation we get the following 
expression, 



45 See, e.g., MA Eq. (13.2). 

46 This is just a particular example of Venn diagrams (introduced in the 1880s by J. Venn) that show possible 
relations (such as intersections, unions, complements, etc.) between various sets of objects, and are a very useful 
tool in the general set theory. 

47 Note that this particular result is consistent with the classical picture of the angular momentum vector: even 
when is length is fixed, the vector may be oriented in various directions, corresponding to different values of its 
Cartesian components. However, in the classical picture, all these component may be fixed simultaneously, while 
in the quantum picture this is not true. 
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4 =xp y -yp x 



= x 



ih 



dy 



■y 



ih 



dx 



= -m- 



(5.181) 



Writing the standard eigenproblem for the operator in this representation, L z y/ m = L z y/ m , we see that it 

is satisfied by eigenfunctions (175), with eigenvalues L z = km - at was already conjectured in Sec. 3.5.) 
More specifically, let us consider a set of eigenstates {/, m) corresponding to a certain degenerate 

eigenvalue of operator L 2 but all possible eigenvalues of operator L z , i.e. all possible quantum numbers 

m. (At this point, / is just some parameter that determines the eigenvalue of L 2 ; it will be defined more 

explicitly in a minute.) In order to analyze this set, it is instrumental to introduce the so-called ladder 

(also called, respectively, "raising" and "lowering") operators Ladder 

operators 
(5.182) and main 

commutation 

- note a substantial similarity between this definition and Eqs. (98). It is straightforward to use this relatl0ns 
definition and the last of Eqs. (178) to calculate the following commutators: 



U=L+ iL, 





L + ,L_ 


= 2%L z , and 


L,L ± . 


= +hi ± , 



(5.183) 



and use Eq. (179) to prove another important relation: 



IS =fiL z +L z +L_L + . 



Main 

commutation 
relations 



(5.184) 

Now let us rewrite the last of Eqs. (183) as 

L z L ± =L ± L z ±hL ± , (5.185) 
and act by its both parts on the ket-vector \l, m) of the set specified above: 

L z L ± \l,m} = L ± L z \l,m)± hL ± | /, m). (5.1 86) 

Since eigenvalues of operator L z are equal to tim, in the first term of the right-hand part we may write 

L z \l,m) = fim\l,m). (5.187) 

With that, Eq. (186) may be recast as 

L z (L ± \l,m))=h{m± l)(Z ± \l,m)) (5.188) 

In a spectacular similarity with Eqs. (111)-(112) for the harmonic oscillator, Eq. (188) means 
that states LAl,mj are also the eigenstates of operator L z , corresponding to eigenvalues (m ± 1). Thus 

the ladder operators act exactly as the creation and annihilation operators in the oscillator, moving the 
system up or down a ladder of eigenstates (Fig. 9). The most significant difference is that now the state 
ladder must end in both directions, because an infinite increase of \m\ , with whatever sign, would cause 
the expectation values of operator 



L 2 +L 2 



L 2 -L 2 . 



(5.189) 



which corresponds to a non-negative observable, to become negative. Hence there should be two states 
on both ends of the ladder, \l, m max ) and \l, m m i n ), for whom 
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L + \l,tn m . dX 



0, 



L \Lm 



o. 



(5.190) 



Due to the symmetry of the whole problem with respect to the replacement m — > -m, we should have 
^min = - w max . This m max is exactly the quantum number that is traditionally called /, so that 



Relation 
between 
m and / 



l<m<+l. 



(5.191) 



eigenket 



IJ 



eigenvalue of L z 
— +1 



L + \l,m 
\l,m 
L \l.m 



L, L 



L, L 



/ ,-/ — — -i- 



m + 1 

m 
m-l 



Fig. 5.9. Hierarchy (ladder diagram) of the 
common eigenstates of operators Z 2 and L z . 



Evidently, this relation of quantum numbers m and / is compatible with the almost-classical 
image of various orientations of the angular momentum vector of the same length in various directions, 
with its z-component taking several (21 + 1) possible values fim. In this simple picture, however, L 
would be equal to square of (L z ) max , i.e. to (hi) ; however, this is not so. Indeed, applying the operator 
equality (184) to the top state \l, m max ) = \l, I), we get 



Eigenvalues 
of/. 2 



L 2 


l,l) = hL z \l,l) + L 2 z \l,l) + L_L + 


l,l) = h 2 l\l,l) + h 2 l 2 


l,l)+0 = h 2 l(l+\\l,l). 



(5.192) 



Since by our initial assumption, all eigenvectors \l, m) correspond to the same eigenvalue of operator L 2 , 
this result means that all these eigenvalues are equal to h 2 l(l + 1). Just as in case of the spin-'/2 vector 

2 2 

operators, the deviation of this result from fi I may be interpreted as the result of unavoidable 
uncertainties ("fluctuations") of the x- and j-components of the angular momentum, that give a finite 
positive contribution to L even if the angular momentum vector is aligned in the best possible way with 
the z-axis. 

Now let us compare our results with those of Sec. 3.6. Using the expression of Cartesian 
coordinates via the spherical ones exactly as was done in Eq. (181), we get the following expressions for 
the ladder operators (182) in the coordinate representation: 



(5.193) 




Coordinate 
representation 

of angular Now plugging this equation, together with Eq. (181), into Eq. (184), we get 

momentum 
operators 



l 2 =-h- 



l d 



sind 80 



smt> 



V 



de 



+ ■ 



sin 2 6 dp 2 



(5.194) 
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But this is exactly the operator (besides its division by constant parameter 2mR) that stands in the left- 
hand part of Eq. (3.156). Hence that equation, which was explored by the "brute-force" (wave- 
mechanical) approach in Sec. 3.6, may be understood as the eigenproblem for operator L 2 in the 
coordinate representation, with eigenfunctions Yi m (0,<p) corresponding to eigenkets {/, m}, and 
eigenvalues L 2 = ImRE. As a reminder, the main result of that, rather involved analysis was expressed 
by Eq. (3.163), which now may be rewritten as 

L] = 2mR 2 E l = h 2 l(l + 1) , (5.195) 

in a full agreement with what was obtained in this section by much more efficient means based on the 
bra-ket formalism. In particular, it is fascinating to see how easy are now many operations with 
eigenvectors \l, m), albeit wavefunctions of these states, spherical harmonics Yr{6,(p), have rather 
complex spatial behavior - please have one more look at Eq. (3.171) and Fig. 3.19. 



5.7. Spin and its addition to orbital angular momentum 

Surprisingly, the theory described in the last section is useful for much more than orbital motion 
analysis. In particular, it helps to generalize the spin-'/2 results discussed in Chapter 4 to other values of 
spin s - the parameter still has to be defined. For that, let us notice that the commutation relations that 
were derived, for s = V2, from the Pauli matrix properties, may be rewritten in exactly the same form as 
Eqs. (178) and (180) for the orbital momentum: 















Ma 


= ms r ,s jjr , 




= 0 



(5.196) 



Spin 

operators: 

commutation 

relations 



It has been postulated (and confirmed by numerous experiments) that these relations hold true 
for any quantum particle. Now note that all the calculations of the last section have been based almost 
exclusively on such relations - the exception will be discussed imminently. Hence, we may repeat them 
for spin operators, and get the relations similar to Eq. (187) and (192): 





s,m s ) = hm s 


s,m s ), S 2 


s,m s } = h 2 s{s + 1) 


s,m s }, 0<s, -s<m s <+s, 



(5.197) 



where m s is a quantum number similar to the orbital number m, and the non-negative constant s is 
defined as the maximum value of \m s \. This parameter is exactly what is called particle's spin - in the 
narrow sense of the word. 

Now let us return to the only part of our orbital moment calculations that has not been derived 
from the commutation relations. This was the fact, based on solution (175) of the orbital motion 
problems, that quantum numbers m (the analog of m s ) are integer. For spin, we do not have such a 
solution, so that the spectrum of numbers m s (and hence its limits ±s) should be found from the more 
loose requirement that the eigenstate ladder, extending from -s to + s, has an integer number of steps. 
Hence, Is has to be integer, i.e. spin s of a quantum particle may be either integer (as it is, for example, 
for photons and gluons), or half-integer (e.g., for all quarks and leptons including electrons). 48 



Spin 

operators: 

eigenstates 

and 

eigenvalues 



48 As a reminder, in the Standard Model of particle physics, such hadrons as mesons and baryons (notably 
including protons and neutrons) are essentially composite particles, with the spin equal to the sum of its 
component quark spins. However, at non-relativistic energies, protons and neutrons may be considered 
fundamental particles with s = V2. 
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Total 
angular 
momentum 
operator 



For s = Vz, this picture yields all spin properties of electron that were derived in Chapter 4 from 
postulate (4.117). In particular, operators S and S z have only 2 common eigenstates, with S z = fim s = 

2 2 2 

±h/2, and both with S = s(s +l)h = (3/4)h . Note that this analogy with the angular momentum sheds a 
new light on the symmetry properties of electrons. Indeed, the fact that m in Eq. (175) is integer was 
derived in Sec. 3.5 from the requirement that making a full circle around axis z, we should find a similar 
value of wavefunction y/ m , which differs from the initial one by an inconsequential factor exp{2mm} . 
With the replacement m — > m s = ± l A, such operation would multiply the wavefunction by Gxp{±m}, i.e. 
reverse its sign. On course, spin cannot be described by a usual wavefunction, but this odd parity of 
electrons (and all other spin-l/2 particles) is clearly revealed in multiparticle systems - see Chapter 8. 

Now we are sufficiently equipped to analyze particles that have both the orbital momentum and 
the spin. In classical mechanics, such a particle would be characterized by the total angular momentum 
vector J = L + S. Following the correspondence principle, we may make an assumption that quantum- 
mechanical properties of this observable may be analyzed using the similarly defined vector operator: 



Total 
momentum: 
commutation 
relations, 
eigenstates, 
and 

eigenvalues 



J = L + S, 



with Cartesian components 



etc, and the magnitude squared equal to 



J. = L+S. 



J 2 =Ji +Ji +j: 



(5.198) 



(5.199) 



(5.200) 



Let us examine the properties of this vector operator. Since its two components describe 
different degrees of freedom of the particle (again, you may say "belong to different Hilbert spaces"), 
they may be considered as completely commuting: 



= 0, 



L 2 ,S 2 



= 0. 



(5.201) 



These above equalities are sufficient to derive the commutation rules of the total angular momentum, 
and, not surprisingly, they turn out to be absolutely similar to those of its components: 







= ihJ r s Jjr , 




= 0. 



(5.202) 



Now repeating all arguments of the last section, we may derive the following expressions for the 
common eigenstates of operators J and J z : 



J *\j> m j) 


= Tim j 


j,mj), J 2 


j, m, ) = h 2 j(j + 1) j, m. ), 0 < j, -j< m j < +j, 



(5.203) 



where j and mj are new quantum numbers. Repeating the arguments made for m s , we may conclude that j 
and mj may be either integer or half-integer. 

Before we proceed, one remark on notation: it is very convenient to use the same letter m for 
numbering eigenstates of all momentum components participating in Eq. (199), with corresponding 
indices (j, I, and s), in particular, to replace what we called m with mi. With this replacement, the main 
results of the last section may be summarized as 
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L z \l,m,) = tnni\l,m,}, Z?|/,m z ) = h 2 l(l + Y)\ l,m t j, 0 < /, —l< m l <+l. 



(5.204) 



In order to understand which eigenstates used is Eqs. (197), (203), and (204) are compatible with 
each other, let us use Eqs. (198)-(202) to calculate the mutual commutators of the operators squared and 
their z-components. The result is 

(5.205) 
(5.206) 

This result may be presented schematically on the following Venn diagram (Fig. 10), in which the 
crossed arrows indicate the only non-commuting pairs of operators. 



J 2 ,L 2 


= 0, 


~J 2 ,S 2 ' 


= 0, 


J\L Z 


*o, 


J 2 ,S Z 


*0. 



operators 
diagonal in 
the uncoupled 
representation 



L 




operators 
diagonal in 
the coupled 
representation 



Fig. 5.10. Venn diagram for angular momentum 
operators, and their mutually-commuting groups. 



This means that just as for each component angular momentum (J, L, and S) considered 
separately we could select a group of common eigenstates for its magnitude squared and the z- 
component, we also may find eigenstates shared by two broader groups of operators, encircled with 
colored lines in Fig. 10. The first group (within the red circle), consists of all operators but J . This 
means that there are eigenstates shared by 5 remaining operators, and they may be characterized by 
certain values of the corresponding quantum numbers: /, mi, s, m s , and m,-. Actually, only 4 of these 
numbers are independent, because due to Eq. (199) for these compatible operators, for each eigenstate of 
the group the "magnetic" numbers m have to satisfy the following relation: 

m. =m, +m s . (5.207) 

Hence the common eigenstates of the operators of this group are fully defined by just 4 quantum 
numbers, for example, /, mi, s, and m s . For some calculations, especially those for systems whose 
Hamiltonians include only operators of this group, it is convenient to the use this set of eigenstates as 
the basis; frequently this is called the uncoupled representation. 

However, in some situations we cannot ignore interactions between the orbital and spin degrees 
of freedom (in the common jargon, the spin-orbit coupling), which leads in particular to splitting (called 
the fine structure) of atomic energy levels even in the absence of external magnetic field. I will discuss 
these effects in detail in the next chapter, and now will only note that they may be described by a 
separate term, proportional to product L ■ S , in the system's Hamiltonian. If this term is not negligible, 
the uncoupled representation becomes inconvenient. Indeed, writing 

J 2 =(L + S) 2 = L 2 +S 1 + 2L-S, (5.208) 
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Definition of 
Clebsch- 
Jordan 
coefficients 



and looking at Fig. 10 again, we see that the operator L • S , describing the spin-orbit coupling, does not 
commute with operators L, and S z . This means that stationary states of the system with such term in 
the Hamiltonian do not belong to the uncoupled representation basis. On the other hand, Eq. (208) 

shows that operator L • S does commute with all 4 operators of another group, encircled with the blue 
line in Fig. 10. According to Eqs. (201), (202), and (205), all operators of that group also commute to 
each other, so that they have common eigenstates that may be marked by the corresponding quantum 
numbers, /, s,j, and m,. This group is the basis for the coupled representation of particle's state. 

Excluding the quantum numbers / and s, common for both groups, from notation, it is convenient 
to denote the common ket-vectors of each group as, respectively, 



Coupled and 
uncoupled 
bases 



for the uncolpled representation's basis, 
\j,m.j ), for the coupled representation' s basis. 



(5.209) 



As we will see in the next chapter, for solution of some problems (e.g., the fine structure and the 
Zeeman effect), we will need the relation between kets ]/, mj) and kets \mi, mj). This relation has the 
structure of the usual linear superposition, 



j,mj) = XI 



m,,m s 



m, , m s 



(5.210) 



whose bra-kets (c-numbers) are called the Clebsch-Gordan coefficients and are essentially the elements 
of the unitary matrix of the transformation between two eigenstate bases (209). 

The best (though imperfect) classical image of Eq. (210) I can offer is as follows. If the lengths 
of vectors L and S (in quantum mechanics associated with numbers / and s, respectively), and also their 
scalar product L-S are all fixed, so is the length of vector J = L + S (think about the quantum number j), 
A specific eigenket J/, mj) maps on a classical state in which all these numbers, plus the z-component J z 
= hrrij of the total momentum, are fixed. However, in classics even the fixation of L 2 , S 2 , J 2 , and J z still 
allows for an arbitrary rotation of the pair of vectors L and S (with a fixed angle between them, and 
hence fixed L-S) about the direction of J - see Fig. 11. 





Fig. 5.11. Classical image of two 
states with the same /, s,j, and mj, 
but different m t and m s . 



Hence the components L z and S z in these conditions are not fixed, and in classical mechanics 
may take a continuum of values, two of which are shown in Fig. 1 1 . In quantum mechanics the states of 
these components are quantized and represented by eigenkets \mi, mj), so that a linear combination of 
such kets is necessary to represent ket \j, mj). This is exactly what Eq. (210) does. 
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Some of properties of the Clebsch-Gordan coefficients (mi, m s \j, m } ) may be readily established. 
For example, the coefficients do not vanish only if the involved magnetic quantum numbers satisfy Eq. 
(207); let us prove this fact. 49 All matrix elements of the null-operator 

l-(L z +S z ) = 6 (5.211) 

should equal zero in any basis; in particular 

(j, m. \j z - (Z 2 + S z )| m l , m s ) = 0. (5 .212) 

Acting by operator J z on the bra-vector, and by the sum (L z + S Z ) on the ket-vector, we get 

[rrij -(m l + m s )](j,m j \m l ,m s } = 0, (5.213) 

thus proving the restriction. 

For the most important case of spin-'/i particles (s = Vi, m s = ± l A), whose uncoupled 
representation evidently includes 2x(2/ +1) states, restriction (207) enables the representation of all 
nonvanishing Clebsch-Gordan coefficients on the simple diagram shown in Fig. 12. 



trij =-1 + 1/2 

y = /±i/2 



/-1/2 



m 



j = 1 + 1/2 



K~ ' 


t 1 


-/ X 


-1 + 1 




Mj =1 + 1/2 



1 + 1/2 



m, 



Fig. 5.12. All possible sets of eigenvalues mi, m s for a particle with fixed /, and s = Vi. Each 
uncoupled-representation state is represented by a dot, while each the coupled-representation state, 
by a single sloped line connecting the dots. 



Indeed, each coupled-representation eigenket J/, mj) with quantum number nij = mi + m s = mi ± l A, 
may be related with non-zero Clebsch-Gordan coefficients to at most two uncoupled-representation 
eigenstates \mi, m s ). Since mi may only take integer values from -/ to +/, m 7 may only take semi-integer 
values on interval [- / - Vi, I + l A]. Hence, by the definition of/' as (m/) ma x, its maximum value has to be / 
+ Vi, and for m, = / + Vi, this is the only possible value. This means that the uncoupled state with mi = I 
and m s = l A should be identical to the coupled-representation state with j = 1+ Vi and mj=l+ l A: 




772, 



m , 




(5.214) 



49 One may think that Eq. (207) is a trivial corollary of Eq. (199). However, now we should be a bit more careful, 
because in the Clebsch-Gordan coefficients, these quantum numbers characterize different groups of eigenstates. 
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However, already for the next value, nij = I - Vz, we need to have two values of j, so that two \mi, 
mj) kets is to be related to two \j, mj) kets by two Clebsch-Gordan coefficients. Since / changes in unit 
steps, these values of j have to be / ± Vz. This choice, 

7 = 7 + 1/2, (5.215) 

evidently satisfies all lower values of mj (again, with only one value, j = I + Vz, necessary for the lowest 
m,j = -I - Vz) - see Fig. 12. Note that the total number of the coupled-representation states is 1 + 2x2/ + 1 
= 2(2/ + 1), i.e. the same as in the uncoupled representation. So, each sum (210), for fixed j, mj (and 
fixed common parameter I), has at most 2 terms, i.e. involves at most 2 Clebsch-Gordan coefficients. 

These coefficients may be calculated in two steps. First, Eq. (198) may be used to obtain 
recursion relations for ladder operators J ± , L ± , and S ± . Then, these relations may be recurrently applied 
to adjacent states of both representations, starting from any of the two states common for them - for 
example, from state with ket-vectors (214), corresponding to the top right point in Fig. 12. Let me leave 
these straightforward but a bit tedious calculations for reader's exercise and just cite the final result of 
this procedure: 50 



Clebsch - 
Gordan 
coefficients 
for s = V2 



1 



1 



m, -m, ,m, = + — 

\ 1 1 2 s 2 



m, = m , + — ,m r = — 

\ ' 3 2 s 2 



j = l±^,m J 



Q±m !+ \/2^ 12 



2/ + 1 
l + m 1 +1/2 



,1/2 



v 21 + 1 



(5.216a) 



For applications, it may be more convenient to use this result in the following equivalent form: 

j = l±-,m f ) 



= + 



l + nij +1/2 
27+1 



,1/2 



A / + m, +1/2 
2/ + 1 



m l = rrij 



-,m s = +- 



,1/2 



(5.216b) 



m, = m , + — ,m, =• 
' ' 2 



We will use this relation in Sec. 6.4 for an analysis of the anomalous Zeeman effect, based on the 
perturbation theory. Moreover, most of the angular momentum addition theory described above is 
immediately applicable to the addition of angular momenta in multiparticle systems, so we will revisit it 
in Chapter 8. 

To conclude this section, I have to note that the Clebsch-Gordan coefficients (for arbitrary s) 
participate also in the so-called Wigner-Eckart theorem that expresses matrix elements of certain 
spherical tensors, in the coupled-representation basis j/, mj), via a reduced set of matrix elements. 
Unfortunately, a discussion of this theorem and its applications would require a higher mathematical 
background than I can expect from my readers, and more time/space than I can afford. 51 



50 For arbitrary spin s, the calculations and even the final expressions for the Clebsh-Gordan coefficients are rather 
bulky. They may be found, typically in a table form, in special monographs - see, e.g., A. R. Edmonds, Angular 
Momentum in Quantum Mechanics, Princeton U. Press, 1957. 

51 For the interested reader I can recommend, either Sec. 17.7 in E. Merzbacher, Quantum Mechanics, 3 rd ed., 
Wiley, 1998, or Sec. 3.10 in J. Sakurai, Modern Quantum Mechanics, Addison- Wesley, 1994. 
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5.8. Exercise problems 



5.1 . A two-level system is in a quantum state a, described by ket-vector \a) = «t|t) + cci\<l), with 
given (generally, complex) c-number coefficients atl- Prove that we can always select a 3-component 
vector a = {a x , a y , a z } of real c-numbers, such that a is an eigenstate of operator a • 6 , where 6 is the 
operator described, in z-basis, by the Pauli matrix vector. Find all possible values of a satisfying this 
condition, and the second eigenstate of operator a • 6 , orthogonal to the given a. Give a Bloch-sphere 
interpretation of your result. 

5.2 . A beam of electrons, fully spin-polarized in z-direction, is propagating in direction y - see, 
e.g., Fig. 4.1. Calculate the probabilities of the alternative results of a Stern-Gerlach experiment with 
magnetic field directed along axis n = n x sin^+ n z cos^. 

5.3 . An electron is in a constant vertical field, so that its Hamiltonian 



but its spin's initial state is an eigenstate of another Hamiltonian (see Problems 4.9, 4.10, 5.1): 



Use any approach you like to calculate the time evolution of the expectation values of the spin 
components. Interpret the results. 

5.4 . For a particle moving in a 3D periodic potential, develop the bra-ket formalism for the q- 
representation, in which a complex amplitude similar to a q in Eq. (2.234) (but generalized to 3D and all 
energy bands) plays the role of the wavefunction. In particular, calculate operators r and v in this 
representation, and use the result to prove Eq. (2.237) for ID motion in the low-field limit. 

Hint: Try to generalize the analysis of the momentum representation in Sec. 5.2. 

5.5 . Calculate, in the WKB approximation, the transmission coefficient T for tunneling of a 2D 
particle with energy E<Uo through a saddle-shaped potential "pass" 



H M =a-a = a x a x +a y a y +a z a z . 



C 




U(x,y) = U 0 1 + 



V 



where Uq and a are real constants 



5.6 . For a ID harmonic oscillator with mass m and frequency coo, calculate: 
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where n are the Fock states. 

5.7 . Calculate the sum (over all n > 0) of the so-called oscillator strengths, 

of quantum transitions between the n th energy level and the ground state, for 

(i) a ID harmonic oscillator, and 

(ii) a ID particle confined in an arbitrary stationary potential. 

5.8 . Use the Heisenberg equation of motion (Eq. (4.199) of the lecture notes) for the direct 
derivation of time evolution law for the creation and annihilation operators of a harmonic oscillator. 

5.9 . Find the expectation value of energy, and time evolution of expectation values of the 
coordinate and momentum of a ID harmonic oscillator, provided that in the initial moment (t = 0) it was 
in state 

l«}=^H + l 16 »> 

where \n) are its stationary (Fock) states. 

5.10 . Re-derive the London dispersion force potential between two 3D harmonic oscillators 
(calculated in Problem 3.7), using the language of mutually-induced polarization. 

5.11 . The discussion of the Glauber state properties in Sec. 5 has used the following general 
statement: if 

Aj]=jul, 

where A and B are arbitrary operators, and /u is an arbitrary c-number, then 

exp 

Prove the statement. 

Hint: One (of several) ways to prove the statement is to expand operator 
/ (A) = exp j/U }i? expj- Aa] into the Taylor series in c-number A, and then evaluate it at A = 1 . 

5.12 . Calculate the energy of the squeezed ground state s of a harmonic oscillator, defined by Eq. 

(172). 
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5.13 . Prove the following relations for the operators of the angular momentum: 

L 2 =L 2 +LL -fiL = L +L L.+HL . 

z +— z z — + z 

(One of them, Eq. (184), was already used in Sec. 6.) 

5.14 . According to Eqs. (188) and their discussion, action of the ladder operators on the common 
eigenkets \l, m) of operators L 2 and Z z may be described as 



L ± \l,m) = L^ ) \l,m±\). 
Calculate coefficients L+- m \ assuming that the eigenstates are normalized: </, m\l, m) = 1. 

5.15 . A particle is in a state a with an orbital wavefunction proportional to spherical harmonic 
F/ {6, (p). Find the angular dependence of the states described by the following ket-vectors: 

(i) L x \a), (ii) L y \a), (iii) L z \a), (iv) L 2 \a) , and (v) L 2 \a) . 

5.16 . Angular state of a spinless particle is described by the following ket-vector: 



\a 



= _Lj/ = 3,m = 0) + |/ = 3,m = 1)). 
v2 



Find the expectation values of the x- and j-components of its angular momentum. Is it sensitive to a 
possible phase shift between two eigenkets? 

5.17 . Express the commutators listed in Eq. (206), \J ,L z \ and \J ,S z \, via L } and S . . 



5.18 . Define the appropriate operator of T, of rotation by angle <fi about certain axis, using the 

similarity of this operation with the shift of a Cartesian coordinate, and use it to re-solve Problem 2, i.e. 
find the probabilities of measurements of a beam of particles with z-polarized spin-Vi, by a Stern- 
Gerlach instrument turned by angle <j> within the [z, x] plane (where y is the axis of particle propagation 
- see Fig. 4.1). 



5.19 . Derive the general recurrence relations for the Clebsh-Gordan coefficients, and use them to 
prove Eq. (216) for spin-l/2 particles. 

Hint: Using the similarity of commutation relations, discussed in Sec. 5.7, generalize the results 
of Problem 13 to all angular momentum operators, and apply them to Eq. (198). 
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Chapter 6. Perturbation Theories 

This chapter discusses several perturbative approaches to problems of quantum mechanics, and their 
simplest applications including the Stark effect, the fine structure of atomic levels, and the Zeeman 
effect. Moreover, the discussion of the perturbation theory of transitions to continuous spectrum and the 
Golden Rule of quantum mechanics in the end of this chapter will naturally bring us to the issue of open 
quantum systems - to be discussed in more detail in the next chapter. 

6.1. Eigenvalue/eigenstate problems 

Unfortunately, only a few problems of quantum mechanics may be solved exactly in the 
analytical form. Actually, in the previous chapters we have solved a substantial fraction of such 
problems for a single particle, and for multi-particle problems the exactly solvable cases are even more 
rare. However, most practical problems of physics feature a certain small parameter, and this smallness 
may be exploited by various approximate analytical methods. Earlier in the course, we have explored 
one of them, the WKB approximation, which is adequate for a particle moving through a slowly 
changing potential profile. Now I will discuss alternative approaches that are more suitable for other 
cases. The historic name for these approaches is the perturbation theory, but it is more fair to speak 
about several such theories, because they differ depending on the type of the problem. 

The simplest perturbation theories address eigenproblems for systems described by time- 
independent Hamiltonians of the type 

H = H {0) +H (l \ (6.1a) 

where the perturbation operator H (l} is "small" - in the sense its addition to the unperturbed operator 

H (0) results in a relatively small change of eigenvalues E„ of the system. A typical problem of this type 
is the ID weakly anharmonic oscillator (Fig. 1) described by Hamiltonian (la) with 

Weakly 
anharmonic 
oscillator 

with small coefficients a, f3, .... 

I will use the anharmonic oscillator as our first particular example, but let me start from 
describing the perturbative approach to the general time-independent Hamiltonian (la). In the bra-ket 
formalism, the eigenproblem for the perturbed system is 

(H {0) +H (l) )\n) = E n \n). (6.2) 

Let the eigenstates and eigenvalues of the unperturbed Hamiltonian, which satisfy equation 

n (0) ), (6.3) 

be known. In this case, to solve problem (2) means to find, first, its perturbed eigenvalues E n and, 
second, coefficients (n ,( °V) of the expansion of perturbed state vectors \n) in series over the unperturbed 
ones, |n >: 



2m 2 



H 



(i) 



cd 3 + ffx A + . 



(6.1b) 



H 



(0) 



(0)\ _ £(0) 
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(0)\/ '(0) 

K 'n ' \n 



(6.4) 



^(0) = ma? 0 V 




U m +H 



in 



Fig. 6.1. The simplest problem for the 
perturbation theory application: a ID 
weakly anharmonic oscillator. (Dashed 
lines characterize the unperturbed, 
harmonic oscillator.) 



Let us plug Eq. (4), with the summation index n' replaced with n", into both parts of Eq. (2): 



r(0) 



n" {0) ) + 



2>" 



>(0) 



n)H 



(i) 



'(0)\ _ 



2>" 



7(0) 



n)E\n 



"(0) 



(6.5) 



and then inner-multiply all terms by an arbitrary unperturbed ket-vector (n'^°\ Assuming that the system 
of unperturbed eigenstates is orthonormal, (n ' (0) |n " (0) ) = 5 n - n ", and using Eq. (3) in the first term of the 
left-hand part, we get the following system of linear equations 

(6.6) 



(6.7) 



E(» w >)ff£.=(»' m n)(E n -E^), 

n" 

where the matrix elements of the perturbation are calculated in unperturbed bra-kets: 



(0) 



H 



(i) 



r(0) 



Perturbation's 

matrix 

elements 



The linear equation system (6) is still exact, 1 and is frequently used for numerical calculations. 
(Since the matrix coefficients (7) typically decrease when n' and/or n" become very large, the sum in 
the left-hand part of Eq. (6) may be typically truncated, still giving acceptable accuracy of the solution.) 
For getting analytical results we need to make more explicit approximations. In the simple perturbation 
theory we are discussing now, this is achieved by the expansion of both eigenenergies and coefficients 
into the Taylor series in a certain small parameter ju of the problem: 



E m + £_ci) + E 



AD 



'(0) 

n \n 



(0) 



+ {n 



where 2 



'(0) 

n \n 



oc (n 



f(0) 



\(k) 



(1) 



+ {n 



r(0) 



(2) 



nj oc /u 



(6.8) 
(6.9) 

(6.10) 



1 Please note its similarity with Eq. (2.215) of the ID band theory. Indeed, the latter equation is not much more 
than a particular form of Eq. (6) for ID wave mechanics, and a specific (periodic) potential U(x) considered as 
perturbation. Moreover, the approximate treatment of the weak potential limit in Sec. 2.7 was essentially a 
particular case of the more general perturbation theory we are discussing now. 

2 Note that, by definition, (n' {0) \n} {0) = £„■„. 
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In order to explore the l st -order approximation, which ignores all terms 0{ju) and higher, let us 
plug only the two first terms of expansions (8) and (9) into the basic system of equations (6): 

ZH«ftf rt An^\n)%{s n , n + (n^|n) (1) >r + £« -E*>) . (6.11) 



Now let us open the parentheses, and disregard all the remaining terms 0(/u ). The result is 



H2=«'+(»' iu >r(^-^), 



(0) 



(1) 



7(0) c(0h 



(6.12) 



This equation is valid for any set of indices n and n'; let us start from the case n = n' and immediately 
get a very simple (and the most important!) result: 



1 st -order 
correction 
of 

energies 



Ml) 



H 



in 



n (0) U (0) 



(6.13) 



For example, let us see what does this result give for two first perturbation terms in the weakly 
anharmonic oscillator (lb) 



a(n m \x 3 \n m ) + /3ln w \x*\n 



,(°) u> 4 L(°) 



(6.14) 



As the reader should know from the solution of Problem 5.6, the first term is zero, while the second one 
yields 3 

3 



E? =- A Pxt{2n 2 + 2n + \). 



(6.15) 



Naturally, there should be some contribution from the (typically, larger) term proportional to a, 
so we need to explore the 2 nd approximation of the perturbation theory. However, before doing that, let 
us complete our discussion of its 1 st order. For n' ^ n, Eq. (12) may be used to calculate the eigenstates 
rather than the eigenvalues: 



MO) 



n ' \n) = 



(i) 



H „ 



in 

n 'n 



E (0) _ E (0) 

n n' 

st 



for n ' ^ n. 



(6.16) 



This means that the eigenket's expansion (4), in the I s order, may be presented as 



r- order 
result 
for 
states 




(6.17) 



Coefficient C cannot be found from Eq. (12), however, requiring the final state n to be normalized, we 
see that other terms may provide only corrections 0(//), so in the 1 st order we should take C = 1. The 
most important feature of Eq. (17) is its denominator: the closer the (unperturbed) eigenenergies of two 
states, the larger is their mutual contribution {hybridization), created by the perturbation. 



3 The result for n = 0 may be readily calculated in the wave-mechanics style as well, using Eq. (2.269) for 
unperturbed ground state wavefunction, and the table integral MA Eq. (6.9d): 

(n (0) x 4 n m ) = \U 0) )*x* ri 0) dxJ — 1 Tx 4 expi-4U = -* 0 4 , 
\ /n=0 J {m 0 J L { 4 J 4 

but for higher values of n, such calculations are much harder, because of more involved Eq. (2.279) for y/ n (a) . 
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This feature also affects the 1 st approximation's validity condition that may be quantified using 
Eq. (16): the magnitudes of all the bra-kets it describes have to be much less then the unperturbed 
product (n|n) (0) = 1, so that all elements of the perturbation matrix have to be much less that the 
difference between the corresponding unperturbed energies. For the anharmonic oscillator's energy 
correction (15), this requirement is reduced to E n {l} « ha>o. 

Now we are ready for going after the 2 nd second order approximation to Eq. (6). Let us focus on 
the case n' = n, because as we already know, only this term will give us a correction to eigenenergies. 
Moreover, we see that since the left-hand side of Eq. (6) already has the small factor n^n-n" ^ /J, the 
bra-ket coefficients in that part may be taken from the 1 st order result (16). As a result, we get 



n *n L - 1 n 



(0) 



(0) • 



(6.18) 



Since H {[) represents an observable (energy), and hence has to be Hermitian, we may rewrite this 
expression as 



\H m \ 2 




(n'\H m \n) 


2 


F (2) _ y 1 H 


= 1 






" La 77(0) 77(0) 


17(0) _ F (0) 

n ri 





2 nd - order 

rr 1 r>\ correction 
(6- 19) for 

energies 



This is the much celebrated 2 nd order perturbation result that frequently (in sufficiently 
symmetric problems) is the first nonvanishing correction to the state energy - for example, from the 
cubic term (proportional to a) in our weakly anharmonic oscillator problem (1). In order to calculate the 
corresponding correction, we may use another result of Problem 5.6: 



n \x \n 



r x ^ 



vV2y 



(6.20) 



x{[n(n-l)(n-2)] 1/2 J„,„_ 3 +3n i,2 S n ,„_ l + 3(n + 1) 3/2 £„, „ +1 +[(n + \)(n + 2)(n + 3)] 12 S n , n 
So, according to Eq. (19), we need to calculate 



E^=a 2 



f x ^ 



vV2y 



z 



{n(n-\)(n-2)] l2 8 n , n _, +3n 3/2 J„,„_ 1 +3(n + \f 2 S„, n+l +[(n + l)(n + 2)(n + 3)] 1/2 S n , n 

hco Q (n-n') 



(6.21) 



The summation is actually not as cumbersome as may look, because all mixed products are proportional 
to different Kronecker deltas and hence vanish. As a result, we need to sum up only the squares of each 
term in the braces: 



M2) 



a 



fico n 



vV2y 



n(n-\){n-2) 9n 9(n + l) 3 (n + l)(n + 2)(n + 3) 
3 + ~T + ~\ ^3 



15 a 2 x b 0 
4 hco, 



(6.22) 



n +n + - 



'o V 



11 

30. 
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Notice that all energy level corrections are negative, regardless of the sign of a. On the contrary, 



the I s order correction E„ (15) depends on the sign of parameter /?, so that the net correction E„ ' + 
E„ may be of any sign. 

Results (17) and (19) are clearly inapplicable to the degenerate case where, in the absence of 
perturbation, several states correspond to the same energy level, because of the divergence of their 
denominators. 4 This divergence hints that the largest effect of the perturbation in that case is the 
degeneracy lifting, e.g., splitting of the initially degenerate energy level £ ,(0) (Fig. 2), and that for the 
analysis of this case we can, to the first approximation, ignore the effect of all other energy levels. (A 
careful analysis shows that this is indeed the case until the level splitting becomes comparable with the 
distance to other energy levels.) 



|(0) 



7(0)- 



)(0) 



H = H 



(0) 



N 



(0) 



H = H {0) +H 



-(I) 



Fig. 5.2. Lifting the energy 
level degeneracy by a 
perturbation (schematically). 



Limiting the summation in Eq. (6) to the group of ./V degenerate states with equal iv (0) = E(0), 
we reduce it to 



'(0) 



n)H 



(i) 



'(0) 



n){E,-E^) 



(6.23) 



where n ,(0 - ) and n " [{J> number N states of the degenerate group. 5 Equation (23) may be rewritten as 



•(0) 



£(n»<°> -E®S HV )= 0, where 2?« . E n -E«\ 



(6.24) 



For each n = 1, 2, ...N, this is a system of N linear, homogenous equations (with N terms each) for 
unknown coefficients (n"^\n). In this problem, we readily recognize the problem of diagonalization of 
the perturbation matrix H^ !) - cf. Sec. 4.4 and in particular Eq. (4.101). Just as in the general case, in the 
condition of self-consistency of the system, we may strip E„»^ of its lower index: 



Energy 
levels 
of initially 
degenerate 
system 





H$-E m Hg 






H® H%-E {X) ... 


= 0. 



(6.25) 



4 This is exactly the reason why such perturbation theories run into serious problems for systems with continuous 
spectrum, and other approximate techniques (such as the WKB approximation) are often necessary. 

5 Note that the choice of the basis is to some extent arbitrary, because due to the linearity of equations of quantum 
mechanics, any linear combination of states «" (0) is also an eigenstate of the unperturbed Hamiltonian. However, 
for using Eq. (24), these combinations have to be orthonormal, as was suggested at the derivation of Eq. (6). 
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According to the definition (24) of Ef- l \ the resulting N energy levels E n may be found as £ ,(0) + E„ l \ 
where E n m are the N roots of Eq. (25). 

If the perturbation matrix is diagonal, the result is extremely simple, 



E -E [V> =E 



(0) _ _ 



H 



(i) 



(6.26) 



and formally coincides with Eq. (13) for the non-degenerate case, but now may give a different result for 
each of N previously degenerate states n. 

Let us see what does this theory give for several important examples. First of all, let us consider 
a two-level system (or a system with two degenerate states with energy far from all others levels), with 
an arbitrary perturbation matrix 6 



H 



(i) _ 



H„ H 



12 



H 



(6.27a) 



22/ 



Since that both the unperturbed Hamiltonian and the operator of its perturbation are Hermitian, the 
diagonal elements of matrix H (1) are real, and its off-diagonal elements are complex conjugates of each 
other. As a result, we can present the matrix in the same form as in Eq. (4.106): 



H 



(i) _ 



a 0 + a z 
K a x +ia y 



a,, —m. 



= a 0 l + a x c x +a y c y +a z <j 7 = a 0 I + a-(T. 



(6.27b) 



where scalar a 0 and the Cartesian components of vector a are real c-number coefficients. The 
corresponding characteristic equation, 



a 0 +a z - 1 
a r +ia. 



a. 



a 0 -a z - E 



(i) 



= 0, 



(6.28) 



has the solution that is familiar to the reader from Chapters 2 and 4: 



E® = E ± 



■ E m =a 0 ±a=a 0 ± [a] + a] + 



2 \l/2 H 



11 + ^22 



+ 



H„ -H 



22 



+ H l2 H 21 



1/2 



.(6.29) 



Let us discuss physics of this simple result. Parameter a 0 = (Hn + H 2 2)/2 is evidently the 
correction to the average energy of both states, that does not give any contribution to the level splitting. 
The splitting, AE = E+ - E., is a hyperbolic function of coefficient a z = (Hn - H 2 2)/2 that describes the 
direct contributions (13) to the eigenstates due to the perturbation. A plot of this function is the famous 
level-anticrossing diagram (Fig. 3) that has already been discussed in Sec. 2.5 in a particular context of 
the weak potential limit of the ID band theory - see Fig. 2.29. 

Now we see that this is a general result for any two-level system. The examples of this behavior 
that we already know include the coupled quantum wells (see Fig. 2.29 and its discussion), band theory 
in the weak coupling limit (Sec. 2.5), and spin-Vi systems discussed through Chapter 4 and in Sec. 5.1. 
By the way, from Sec. 4.4 we already know the perturbed states in the middle of the anticrossing 



6 For brevity, I am dropping the upper index (1) in the matrix elements. 
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diagram (at a z = 0). For example, if a y = 0, then our perturbation Hamiltonian matrix (27), besides the 
trivial term proportional to ao, is proportional to a x , and hence we can use the result (4.1 14) to write: 7 



= -Lfl<o>) ± | 2 CO))) 



(6.30) 



where 1 (0) and 2 (U> are system's states in the absence of the perturbation. 



,(0) 



f£ ± -(£ (0) +a 0 ) 



+ 



(a 2 x +a 2 ) 



(a 2 +a 2 ) U 




Fig. 6.3. Level-anticrossing diagram for an 
arbitrary two-level system. 



This analysis shows that other results of our discussions of particular two-level systems in Sec. 
2.6 and 4.6 are also general. For example, if we put such any two-level system into an initial state 
different from one of the eigenstates +, the probability of its finding it in any of states 1 (0) or 2°' will 
oscillate with frequency 



Q = 



AE 



(6.31) 



Hence, for a spin-Va particle in a z-oriented magnetic field, the periodic oscillations of the x- and y- 
components of spin vector, described by Eqs. (4.196) and (4.202), may be interpreted not only as the 
torque-induced precession of spin within the [x, y] plane, but alternatively as the quantum oscillations of 
the of the z-component of spin between states T and 4- with energies E\ and El given by Eq. (4.167). 

Some other examples of such oscillations may be rather unexpected. For example, the 
ammonium molecule NH3 (Fig. 4) has two symmetric states which differ by the inversion of the 
nitrogen atom relative to the plane of the three hydrogen atoms, and are coupled due to quantum- 
mechanical tunneling of the nitrogen atom through the plane of hydrogen atoms. 8 Since for this 
molecule, the level splitting AE corresponds to an experimentally convenient frequency Q,/27i» 24 GHz, 
it played an important historic role for the initial development of first atomic frequency standards and 
microwave quantum generators (masers) in the 1950-60s, that paved the way toward the development of 
the whole laser technology. 



7 Alternatively, if a x = 0, then |±) = (1/V2)( |1 (0) ) ±i|2 (0) )). Note that besides a phase coefficient, these states are 
similar in that they present a coherent superposition of the unperturbed states, with a 50/50 chance to find the 
perturbed system in any of those states. In that sense, the effects of perturbation coefficients a x and a y are similar. 

8 Since the hydrogen atoms are much lighter, it is more fair to speak about their correlated tunneling around the 
(nearly immobile) nitrogen atom. 
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6.2. The Stark effect 



Another example of the level degeneracy lifted by a perturbation is the linear Stark effect - 
atomic level splitting by an external electric field. Let us study this effect, in the linear approximation, 
for a hydrogen-like atom. Taking the direction of external electric field & (which is practically uniform 

on the atomic scale) for the z-axis, the perturbation may be represented by the following Hamiltonian: 



H 



(i) _ 



-q£z = -q£r cos 0 . 



(6.32) 



(Since we will work in the coordinate representation, we may skip the operator sign from this point on.) 

As you (should :-) remember, energy levels of a hydrogen-like atom depend only on the main 
quantum number n - see Eq. (3.191); hence all states but the ground state n = 1 ("Is" in the 
spectroscopic nomenclature) in which I = m = 0, have some degeneracy that grows rapidly with n. This 
is why I will carry out the calculations only for the lowest degenerate level with n = 2. Since generally 0 
< / < n -1, here / may be equal either 0 (one 2s state, with m = 0) or 1 (three 2p states, with m = 0, ±1). 
Due to this 4-fold degeneracy, fT 1 -* is a 4x4 matrix with 16 elements: 



Stark 

effect's 

perturbation 



1 = 1 



1=0 



m = 0m = 0m = +\m = -\ 



H 



(i) _ 



#n 


H n 


H n 


#14 


#21 


#22 


H 23 


#24 


#31 


#32 


#33 


#34 


H 41 


H 42 


H 4i 


#44 J 



m ■ 



m ■■ 



0, 

o, 

m = +1, 
m = — 1, 



1 = 0, 



1=1. 



(6.33) 



However, please do not be scared. First, due to the Hermitian character of the operator, only 1 0 
of the matrix elements (4 diagonal ones and 6 off-diagonal elements) may be substantially different. 
Moreover, due to a high symmetry of the problem, there are a lot of zeros even among these elements. 
Indeed, let us have a look at the angular components Y/ n of the corresponding wavefunctions, described 
by Eqs. (3. 174)-(3. 175). For states with m = ±1, the azimuthal parts of wave functions are proportional 
to exp hence the off-diagonal elements H34 and H43 of matrix (33), relating these functions, are 

proportional to 



dnYr*H (1) Y, T 



oc 



( + ' 

]dq>\ e ± l(p 



,+ "P \- 



0. 



(6.34) 
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The azimuthal-angle symmetry also kills the off-diagonal elements Hu, H i4 , H 2 i, H 24 (and hence 
their complex conjugates H31, H41, H 32 , and H42), because they relate states with m = 0 and m * 0, and 
are proportional to 

IdnY^HVY* K\d<pe ±i(p = 0. (6.35) 

0 

For the diagonal elements H33 and H44, corresponding to m = ±1, the azimuthal-angle integral 
does not vanish, but since the spherical functions depend on the polar angle as sin 6, the matrix elements 
are proportional to 

n +1 

jdQY^H^Yf oc j"sin6tf#sin#cos#sin# = j"cos#(l-cos 2 #)d(cos#), (6.36) 

0 -1 

i.e. are equal to zero as any limit-symmetric integral of an odd function. Finally, for states 2s and 2p 
with m = 0, the diagonal elements H n and H 22 are also killed by the polar-angle integration: 



I JQF 0 ° H (l) Y 0 ° oc J sin OdO cos 0 = jcos<9 d(cos8) = 0, 



-1 
+1 



(6.37) 



IdQY* H (i X « Jsin^cos 3 0 = j"cos 3 # J(cos6>) = 0. 



Hence, the only nonvanishing matrix elements are two off-diagonal elements //12 and //21 relating 
different states with m = 0, because they are proportional to 

UQY 0 0 * cos0Y,° =— f^fsin6tf#cos 2 0 = ^*0. (6.38) 
J 4 ^o 0 V3 

What remains is to use Eqs. (3.199) for the radial parts of these functions to finish the calculation of 
those two matrix elements: 

H u = H 2l = -^=\r 2 dr^ 20 (r)r^ 2l (r), (6.39) 
V3 0 

where the radial functions are given by Eqs. (3.199). Due to the structure of function ^,o(r), the integral 
falls into a sum of two parts, both of the type we have already met. 9 The final result is 

H n =H 2l =3q#r 0 , (6.40) 

where r 0 is the radius scale given by Eq. (3.183); for the hydrogen atom it is just the Bohr radius r B 
(1.13). 

Thus, for our case the perturbation matrix (33) is reduced to 



9 See, e.g., MA Eq. (6.7b). 



Chapter 6 



Page 9 of 36 



Essential Graduate Physics 



QM: Quantum Mechanics 



H 



(i) 



' 0 3q#r 0 0 0 ^ 

3q£r 0 0 0 0 

0 0 0 0 

0 0 0 0 



so that the condition (25) of self-consistency is 



_£(!) 


3q#r 0 


0 


0 


3q£r 0 




0 


0 


0 


0 




0 


0 


0 


0 


_ E m 



giving a very simple characteristic equation 
with the roots 



so that the degeneracy is only partly lifted - see Fig. 5. 



= 0, 



= 0. 



Eg=0, Eg=±3^ 0 . 



(6.41) 



(6.42) 



(6.43) 



Linear 
/r ... Stark 
(6.44) effect 

for n = 2 



+ 



7(0) 



3^r 0 



3tf<£r 0 



m = 0 
m = +1 
m = 0 



Fig. 6.5. Linear Stark effect for level 
= 2 of a hydrogen-like atom. 



Generally, in order to understand the nature of states corresponding to these levels, we should go 
back to Eq. (24) with each calculated value of E n , and calculate the corresponding expansion 
coefficients (n" (0) |n), which describe the perturbed states. However, in our simple case the outcome of 
the procedure is clear in advance. Indeed, since the states with m = ± 1 are not affected by the 
perturbation (in the linear approximation in electric field), their degeneracy is not lifted, and energy 
unaffected - see the middle level in Fig. 5. On the other hand, the perturbation matrix connecting states 
2s and 2p, i.e. the top left 2x2 part of the full matrix (41), is proportional to the Pauli matrix o x , and we 
already know the result of its diagonalization - see Eqs. (4.114). This means that the upper and lower 
split levels correspond to very simple linear combinations of the previously degenerate states, 



l±) =T2 [l2s 



|2P»- 



(6.45) 



both with m = 0. 
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Finally, let us estimate the magnitude of the linear Stark effect for a hydrogen atom. For a very 
high electric field of <?= 3xl0 6 V/m, 10 q = e * 1.6xl0" 19 C, and r 0 = r B * 0.5xl0" 10 m, we get a level 

22 

splitting of 3g^o ~ 0.8x10" J ~ 0.5 meV. This number is much lower than the unperturbed energy of 

the level, E 2 = -En/2x2 2 « -3.4 eV, so that the perturbation result is quite valid. On the other hand, the 
splitting is much larger than the resolution limit imposed by the natural linewidth (~ 10" 7 £2, see Chapter 
9), so that the effect is quite observable even in substantially lower electric fields. 



6.3. Fine structure of atomic levels 



Now let us analyze, for the simplest case of a hydrogen-like atom, the so-called fine structure of 
atomic levels - their degeneracy lifting even in the absence of external fields. In the limit when the 
effective speed v of electron motion is much smaller than the speed of light c (as it is in the hydrogen 
atom), the fine structure may be analyzed as a sum of two small relativistic effects. To analyze the first 
of these effects, let us expand the well-known classical relativistic expression 11 for the kinetic energy T 
= E - mc 2 of a free particle with the rest mass m, 



T ( 2 4, 2 2\ l/: 

T = [m c + p c J 



mc = mc 



J/2 



2 2 

v m c j 



(6.46) 



into the Taylor series with respect to the small ratio (p/mc) 2 ~ {vie) : 



mc 



1 + 



2\mc ) 



\ mc j 



+ . 



3 _2 



2m 8m c 



+ . 



(6.47) 



and neglect all the terms besides the first (non-relativistic) one and the next term representing the first 
nonvanishing relativistic correction of T. 

In accordance with the correspondence principle, the quantum-mechanical problem in this 
approximation may be described by the perturbative Hamiltonian (la), where the unperturbed (non- 
relativistic) Hamiltonian of the problem, whose eigenstates and eigenenergies were discussed in Sec. 
3.5, is 



- 2 



H m =^- + U(r), U(r) = 
2m 



C 
r 



(6.48) 



while the small kinetic-relativistic perturbation is 



Kinetic- 
relativistic 
perturbation 




(6.49a) 



Using Eq. (48), we may rewrite the last formula as 



10 This value approximately corresponds to the threshold of electric breakdown in air, due to the impact ionization 
on the surface of typical metallic electrodes. (Reducing air pressure only enhances the ionization and lowers the 
breakdown threshold.) As a result, experiments with higher fields are rather difficult. 

11 See, e.g., EM Sec. 9.3, in particular Eq. (9.78) - or any undergraduate text on special relativity. 
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4mc v ' 



(6.49b) 



so that its matrix elements, participating in the characteristic equation (25) for a given degenerate energy 
level (3.191), i.e. a given principal quantum number n, are 



nlm\H {X) \nl'm') = l —(nlm\(H m -t/(r))(# (0) -U(r))\nl) 

4mc 



(6.50) 



where the bra- and ket vectors describe the unperturbed eigenstates whose eigenfunctions (in the 
coordinate representation) are given by Eq. (3.190): y/ n ,i,m = Ki,i(r)Yi n (0,(p). 

It is straightforward (and hence left for the reader :-) to prove that all off-diagonal elements of 
the set (50) are equal to 0. Thus we may use Eq. (26) for each set of quantum numbers {n, I, m} : 



n,l,m nj,m n 



nlm\H (1) \nlm) = ^/fe (0) -U(r)f 



n,l,i 



4mc 1 



■2E.(U) +{U 



1 



4mc 



4n 



■ + 2^fC - +C 2 ^ 



nl 



(6.51) 



where index m has been dropped, because the radial wavefunctions Ki,i(r), which affect the averages, do 

not depend on that quantum number. Now using Eqs. (3.183), (3.191) and the first two of Eqs. (3.201), 
we finally get 





f n 3^ 


2El 


C n 3^ 




n,l ..2 2 4 

4ncn 


v / + l/2 4j 


2 

mc 


v / + 1/2 4j 



Kinetic- 
relativistic 
(p.jz.) energy 

correction 



Let us discuss this result. First of all, its last form confirms that that correction (52) is indeed 
much smaller than the unperturbed energy E„ (and hence the perturbation theory is solid) if the latter is 
much smaller than the relativistic rest energy mc of the particle. Next, since in the Bohr problem n> I + 
1, the first fraction in the parentheses of Eq. (52) is always larger than 1, so that the relativistic 
correction to kinetic energy is negative for all n and /. (This is already evident from Eqs. (6.49), which 
show that the correction Hamiltonian is a negatively defined form.) Finally, at a fixed principal number 
n, the negative correction's magnitude decreases with the growth of /. This fact may be classically 
interpreted using Eq. (3.200): the lager is / (at fixed n), the smaller is particle's average distance from 
the center, and hence the smaller is its effective velocity, and smaller is the magnitude of the quantum- 
mechanical average of the negative relativistic correction (49a) to the kinetic energy. 

Result (52) is conceptually valid for any physics of interaction U(r) = -C/r. However, if the 
interaction is Coulombic, say between an electron with charge (-e) and a nucleus of charge (+Ze), there 
is also another relativistic correction to energy, due to the so-called spin-orbit interaction. Its physics 
may be understood from the following semi-qualitative, classical reasoning: from the "the point of 
view" of an electron rotating about the nucleus at constant distance r with velocity v, it is the nucleus, of 
charge Ze, that rotates about the electron with velocity (-v) and hence time period T = ItttIv. From the 
point of view of magnetostatics, such circular motion of electric charge Q = Ze is equivalent to the 
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constant circular electric current I = Qv= {Ze){yl2jjj) which creates, at electron's location, i.e. in the 
center of the current loop, a magnetic field with magnitude 12 



2r 



ju 0 Zev _ jU 0 Zev 
2r 2m Am 2 



(6.53) 



The field's direction n is perpendicular to the apparent plane of the nucleus' rotation (i.e. that of the real 
rotation of the electron), and hence its vector may be readily expressed via the similarly directed vector 
L of electron's angular (orbital) momentum: 



/j 0 Zev jU 0 Ze 
Am 2 4m^m, 



myrvL = 



4;zr 3 m„ 



L = 



Ze 



4as 0 r m e c' 



(6.54) 



where the last transition is due to the basic relation between the SI unit constants; SojUq = c 



A more careful (but still classical) analysis of the problem 13 brings both good and bad news. The 
bad news is that result (54) is wrong by a factor of 2 even for the circular motion, because the electron 
moves with acceleration, and the reference frame bound to its cannot be considered inertial (as was 
implied in the above reasoning), so that the actual magnetic field felt by the electron is 



Ze 



S7rs 0 r m e c z 



(6.55) 



The good news is that, so corrected, the result is valid (on the average) for not only circular but 
arbitrary (elliptic 14 ) orbital motion in the Coulomb field U(r). Hence from the discussion in Sec. 4.1 and 
Sec. 4.4 we may expect that the quantum-mechanical description of the interaction between this 
apparent magnetic field and electron's spin moment (4.116) is given by the following perturbation 
Hamiltonian 



H 



\i-3 



f 






r 




e 










S 




V 


m e 


J 


V 



Ze 



%7rs 0 r m e c 



1 Ze' 



2m 2 c 2 47t£ n r 3 



1 „ „ 
SI, 



(6.56a) 



Spin _ where the small correction to value g e = 2 of electron's g-factor has been ignored, because Eq. (56) is 
orbit already a small correction. This expression is confirmed by the fully-relativistic Dirac theory, to be 
perturbation discussed m § ec 9 7 below: it yields, for an arbitrary central potential t/(r), the following Hamiltonian 
of the spin-orbit coupling: 



H 



(i) _ 



2 2 



2m c 



r dr 



(6.56b) 



For the Coulomb potential U(r) = -Ze IAjeeqt, this formula is reduced to Eq. (56a). 

As we already know from the discussion in Sec. 5.7, such Hamiltonian commutes with all 
operators diagonal in the coupled representation (inside the blue line in Fig. 5.10): L 2 , 5 2 , J 2 , and J 7 . 
Hence, using Eq. (5.208) to rewrite the spin-orbit Hamiltonian as 



12 See, e.g., EM Sec. 5.1, in particular, Eq. (5.24). 

13 See, e.g., R. Harr and L. Curtis, Am. J. Phys. 55, 1044 (1987). 

14 See, e.g., CM Sec. 3.6. 
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H 



(i) _ 



1 Ze 2 1 Uj : 

2mm 2 c 2 AttSq r 3 2 



'). 



(6.57) 



we may conclude that this operator is diagonal in the coupled representation with fixed quantum 
numbers /, s, j, and m 7 -. As a result, in this representation, we may again use Eq. (26) for each set {/, j, 

mj}: 



2mm e c 4tts 0 \r / , 2 



where the indices irrelevant for each particular term have been dropped. (As a reminder, the spin 
quantum number s is fixed by particle's nature; for our case of an electron, s = Vz.) Now using the last of 
Eqs. (3.201), and similar expressions (5.192), (5.197), and (5.203), we get an explicit expression for the 
spin-orbit corrections 15 



F (i) 



1 Ze 2 fi 2 j(j + \)-l(l + \)-3/4 



2mV 



4tt£ 0 2r Q 



n 3 /(/ + l/2)(/ + l) 



mc 



j(j+l)-/(/ + l)-3/4 
l{l + \l2){l+\) 



(6.59) 



The last form of its right-hand part shows very clearly that this correction has the same scale as 
the kinetic correction (52), 16 so that they should be considered together. In the first order of the 
perturbation they may be just added, giving a very simple formula for the net fine structure of level n: 



E 2 






11 fine _ - 2 

2m e c 





(6.60) 



Spin- 
orbit 
energy 
correction 
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This simplicity, as well as the independence of the result of the orbital quantum number Z, will become 
less surprising when (in Sec. 9.7) we see that this formula follows in one shot from the Dirac theory, in 
which the Bohr atom's energy spectrum in numbered only with n and j, but not /. 

Let us recall (see Sec. 5.7) that for an electron (s = Vz), the quantum number j may take n positive 
half-integer values, from Vz to n - Vz. With the account of this fact, Eq. (60) shows that the fine structure 

see Fig. 6. 



of n Bohr's energy level has n sub-levels 
E 




l = n-\ 



1 = 1,2 



/ = 0,1 
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7 = n-l/2 

7 = 5/2 
7=3/2 



7=1/2 



Fig. 6.6. Fine structure of a 
hydrogen-like atom's level. 



15 Factor I in the denominator does not give a divergence at / = 0, because in this case j = s = Vz, and the 
nominator turns into 0 as well. A careful analysis of this case (which may be found, e.g., in G. K. Woolgate, 
Elementary Atomic Structure, 2 nd ed., Oxford, 1983), as well as the exact solution of the Bohr atom problem 
within the Dirac theory (Chapter 9) show that the final result (60), which is independent of /, is valid even in this 
case. 

16 This is natural, because the magnetic interaction of charged particles is an essentially relativistic effect, of the 
same order (~v 2 /c 2 ) as the kinetic correction (49a) - see, e.g., EM Sec. 5.1, in particular Eq. (5.3). 
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Please note that according to Eq. (5.203), each of these sub-levels is still (2j + l)-times 
degenerate in quantum number rtij. This degeneracy is very natural, because in the absence of external 
field the system is still isotropic. Moreover, on each fine-structure level, besides the lowest (j = l A) and 
the highest (j = n- Vi) ones, each of the m ; -states is doubly-degenerate in the orbital quantum number / = 
j + Vi - see the labels of / in Fig. 6. (According to Eq. (5.215), each of these states, with fixed j and m,, 
may be represented as a linear combination of two states with adjacent values of /, and hence different 
electron spin orientations, m s = ± l A, weighed with the Clebsch-Gordan coefficients.) 

These details aside, one may crudely say that the relativistic corrections make the total 
eigenenergy to grow with /, contributing to the effect already mentioned at our analysis of the periodic 
table of elements in Sec. 3.7. The relative scale of this increase may be evaluated from the largest 
deviation from the unperturbed energy E n , reached for the state with j = l A (and hence / = 0): 
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(6.61) 



where a is the fine structure constant, 



a = 



Ansjic 



137 



(6.62) 



that was already mentioned in Sec. 4.4. 17 These expressions show that the fine structure is indeed a 
relatively small correction (~a 2 ) for the hydrogen atom, but it rapidly grows (as Z ) with the nuclear 
charge (atomic number), and becomes rather substantial for the heaviest atoms with Z ~ 100. 



6.4. The Zeeman effect 

Now, we are ready to review the Zeeman effect - the lifting of atomic level degeneracy by an 
external magnetic field. 18 Using Eq. (3.26) (with q = -e) for the description of electron's orbital motion 
in the field, and Eq. (4.116) for the operator of electron's magnetic moment due to its spin-VS, we see 
that even for a hydrogen-like (i.e. single-electron) atom, neglecting the relativistic effects, the full 
Hamiltonian is rather bulky: 

1 / - V Ze 2 e 
H=— (p + eA -^^ + — 3-S. (6.63) 

2m e Ans Q r m e 

There are several simplifications we may make. First, let us assume that the external field is 
spatial-uniform on the atomic scale (which is a very good approximation for most cases), so that we can 
take the vector-potential in an axially-symmetric gauge - cf. Eq. (3.132): 



17 See the Selected Physical Constants appendix for the more exact value of this constant. Its expression in 
Gaussian units, a = e 2 /hc, makes even more evident the fact that a is the just fundamental constant ratio which 
characterizes the strength (or rather the weakness :-) of electromagnetic effects in quantum mechanics - that in 
particular makes the perturbative quantum electrodynamics possible. The alternative expression a = Enlm e c 2 , 
where E H is the Hartree energy (1.9), the scale of all E n , is also very revealing. 

18 It was discovered experimentally in 1896 by P. Zeeman who, amazingly, was fired from the University of 
Leiden for an unauthorized use of lab equipment for this work - just to receive a Nobel Prize for it in a few years. 
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A = 



1 



■ x r. 



(6.64) 



Second, let us neglect the terms proportional to which are small in practical magnetic fields of the 
order of a few Tesla. 19 The remaining term in the effective kinetic energy, describing the interaction 
with the magnetic field, is linear in the momentum operator, so that we may repeat the standard classical 
calculation 20 to reduce it to the product of 3 by the orbital magnetic moment's component m z = - 
eL z /2m e - besides that both m z and L z should be understood as operators now. As a result, the 
Hamiltonian reduces to Eq. (la), H (0} + H (l) , where H (0) is that of the atom at 3= 0, and 



H 
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e3 
2m„ 
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(6.65) 
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The form of the perturbation immediately reveals the major complication with the Zeeman effect 
description. Namely, in comparison with its contribution (5.198) to the total angular momentum of the 
electron, its spin- 1/2 produces a twice larger contribution into the magnetic moment, so that the right- 
hand part of Eq. (65) is not proportional to the total angular moment. As a result, the effect description is 
simple only in two limits. 

If the magnetic field is so high that its effects are much stronger than the relativistic (fine- 
structure) effects discussed in the last section, we may treat two terms in Eq. (48) as independent 
perturbations of different (orbital and spin) degrees of freedom. Since in the z-basis each of the 
perturbation matrices is diagonal, we can again use Eq. (26): 
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(6.66) 



effect 



This result describes splitting of each 2x(2/ + l)-degenerate energy level, with certain n and /, into (21 
+3) levels (Fig. 7), with the adjacent level splitting of jU B 3, equal to ~10" 23 J ~ 10" 4 eV/T. Note that all 

levels, besides the top and bottom one, remain doubly degenerate. This limit of the Zeeman effect is 
sometimes called the Paschen-Back effect - which simplicity was recognized only in the 1920s, due to 
the need in very high magnetic fields for its observation. 
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Fig. 6.7. The Paschen-Back effect. 



19 Despite its smallness, the quadratic term is necessary for description of the negative contribution of the orbital 
motion to the magnetic susceptibility j m (the so-called orbital diamagnetism, see EM Sec. 5.5), whose analysis, 
using Eq. (63), is left for reader's exercise. 

20 See, e.g., EM Sec. 5.4, in particular Eqs. (5.95) and (5.100). 
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In the opposite limit of low magnetic field, the Zeeman effect takes place on the background of 
the fine structure splitting. As was discussed in Sec. 3, at 3 = 0 each split sub-level has a 2(2/ + l)-fold 
degeneracy corresponding to (21 + 1) different values of the half-integer quantum number m ; , ranging 
from —j to +j, and 2 values of integer I = j + Vi - see Fig. 6. The magnetic field lifts this degeneracy. 21 
Indeed, in the coupled representation discussed in Sec. 5.7, perturbation (48) is described by the matrix 
with elements 
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(6.67) 



Now plugging into the last term the Clebsh-Gordan expansions (5.216a) for the bra- and ket-vectors, 
and taking into account that operator S, gives non-zero bra-kets only for m s = m' s , matrix (67) becomes 
diagonal, and may again use Eq. (26) to get 
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where two signs correspond to the two possible values of / =j + l A - see Fig. 8. 
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Fig. 6.8. Anomalous Zeeman effect in a hydrogen-like atom - schematically. 



We see that the magnetic field splits each sub-level of the fine structure, with a given /, into 2j + 
1 levels, with the distance between the levels depending on I. In the end of the 1890s, when the Zeeman 
effect was first observed, there was no notion of spin at all, so that this puzzling result was called the 
anomalous Zeeman effect. (In this terminology, the normal Zeeman effect is the one with no spin 
splitting, i.e. without the second terms in the parentheses of Eqs. (66)-(68); it may be observed 
experimentally in atoms with the net spin s = 0.) 



21 In almost-hydrogen-like, but more complex atoms (such as those of alkali metals), the degeneracy in I is lifted 
by electron- electron interaction even in the absence of the external magnetic field. 
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The strict quantum-mechanical analysis of the anomalous Zeeman effect for arbitrary s (which is 
important for applications to multi-electron atoms) is not that complex, but requires explicit expressions 
for the corresponding Clebsch-Gordan coefficients, which are rather bulky. Let me just cite the 
unexpectedly simple result of this analysis: 



AE = jU B 3m jS, 



where g is the so-called Lande factor. 



■ 22 




(6.69) 



(6.70) 



For s = Vi (and hence j = l± Vi), this factor is reduced to the parentheses in the last form of Eq. (68). 

It is remarkable that Eqs. (69)-(70) may be readily derived using very plausible classical 
arguments, similar to those used in Sec. 5.7 - see Fig. 5.11 and its discussion. As we have seen above, in 
the absence of spin, the quantization of observable L z is an extension of the classical torque-induced 
precession of the corresponding vector (say, L) about the magnetic field direction, so that the interaction 
energy, proportional to M^ z = 3-h, remains constant (Fig. 9a). At the spin-orbit interaction without 

external magnetic field, the Hamiltonian includes the operator of product SL, so that it has to be 
quantized, i.e. constant, together with J 2 , L 2 , and S . Hence, this system's classical image is a rapid 
precession of vectors S and L about the direction of vector J = L + S, so that the spin-orbit interaction 
energy, proportional to product L-S, remains constant (Fig. 9b). On this backdrop, the anomalous 
Zeeman effect in a relatively weak magnetic field 3 = ^a z corresponds to a slow precession of vector J 

("dragging" the rapidly rotating vectors L and S with it) about axis z. 
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Fig. 6.9. Classical images of (a) the 
orbital angular momentum's quantization 
in external magnetic field and (b) the 
fine-structure level splitting. 



This picture allows us to conjecture that what is important for the slow precession rate are only 
the vectors L and S averaged over the period of the much faster precession about vector J - in other 
words, only their components Lj and Sj directed along vector J. Classically, these components may be 
calculated as 

L= 7^ J ' S ' = 7^ J - (6 - 71) 

The scalar products participating in these expressions may be readily expressed via the squared length of 
the vectors, using the following evident formulas: 



22 This formula is frequently used with capital letters /, S, and L, which denote the quantum numbers of the atom 
as a whole. 
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S 2 =(J-L) 2 =J 2 +L 2 -2L J, L 2 =(J-S) 2 =J 2 +S 2 -2J-S. 



(6.72) 



As a result, we get the following time average: 
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(6.73) 



The last move is to smuggle in some quantum mechanics by using, instead of vector lengths 
squared, and the z-component of J z , their eigenvalues given by Eqs. (5.197), (5.203), and (5.204). As a 
result, we immediately arrive at the exact result given by Eqs. (69)-(70). This coincidence encourages 
thinking about quantum mechanics of angular momenta in classical terms of torque-induced precession, 
and turns out to be very fruitful in more complex problems of atomic and molecular physics. 

The high-field limit and low-field limits of the Zeeman effect, described respectively by Eqs. 
(66) and (68), are separated by a medium field strength range in which the Zeeman splitting is of the 
order of the fine-structure splitting analyzed in Sec. 3. There is no time in this course for a quantitative 
analysis of this crossover. 23 



6.5. Time-dependent perturbations 

Now let us proceed to the case when perturbation H (1) in Eq. (la) is a function of time, while 
H (0) is time-independent. The adequate perturbative approach to this problem, and its results, depend 
critically on the relation between the characteristic frequency (or the characteristic reciprocal time) <x> of 
the perturbation and the distance between the initial system's energy levels: 

ha>*+\E n -E n ,\. (6.74) 

In the easiest case when all essential frequencies of a perturbation are very small in the sense of 
Eq. (74), we are dealing with the so-called adiabatic change of parameters, that may be treated 
essentially as a time-independent perturbation (see the previous sections of this chapter). The most 
interesting observation here is that the adiabatic perturbation does not allow any significant transfer of 
system's probability from one eigenstate to another. For example, in the WKB limit of the orbital 
motion, the Bohr-Sommerfeld quantization rule (2.110), and its multi-dimensional generalization, 
guarantee that integral 

|p-Jr, (6.75) 

c 

taken along the particle's classical trajectory, is an adiabatic invariant, i.e. does not change at a slow 
change of system's parameters. (It is curious that classical mechanics also guarantees the invariance of 
integral (75), but its proof there 24 is much harder than the quantum-mechanical derivation of this fact, 



23 For a more complete discussion of the Stark, Zeeman, and fine- structure effects in atoms, I can recommend, for 
example, either the monograph by G. Woolgate cited above, or the one by I. Sobelman, Theory of Atomic Spectra, 
Alpha Science, 2006. 

24 See, e.g., CM Sec. 10.2. 
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carried out in Sec. 2.4.) This is why even if the perturbation becomes large with time (while changing 
sufficiently slowly), we can expect the eigenstate and eigenvalue classification to persist. 

Now let us proceed to the more important (and more complex) case when both sides of Eq. (74) 
are comparable, and use for its discussion the Schrodinger picture of quantum mechanics given by Eqs. 
(4.157) and (4.158). Combining these equations, we get the Schrodinger equation in the form 

ihj t \a(t)) = (H™ +H m (t)}a(t)}. (6.76) 

Very much in the spirit of our treatment of the time-independent case in Sec. 1, let us represent the time- 
dependent ket-vector of the system with its expansion, 

|a(0) = 2»H«(0), (6.77) 

n 

over the full and orthonormal set of the unperturbed, stationary ket-vectors defined by equation 

H (0) \n) = E n \n), (6.78) 

where bra-kets (n\a{t)) are time-dependent coefficients. Plugging expansion (77), with n replaced with 
n', into both sides of Eq. (76), and then inner-multiplying both its parts by bra-vector (n\ of another 
unperturbed (and hence time-independent) state of the system, we get a set of linear, ordinary 
differential equations for the expansion coefficients: 

ih j f (n I a(t)) = E n (n \ a(t)} + £ H « (t){n ' | a(t)), (6.79) 

where the matrix elements of the perturbation in the unperturbed state basis, defined similarly to Eq. (7), 
are now functions of time: 

H«Xt) = {n\H m (t)\n'}. (6.80) 

The set of differential equations (79), which are still exact, may be useful for numerical 
calculations, because for virtually all practical problems the set of eigenstates n ' may be restricted with 
an acceptable error in the final result. 25 However, Eq. (79) has a certain technical inconvenience, which 
becomes clear if we consider its (evident) solution in the absence of perturbation: 26 



(n | a(t)) = (n | «(0)) exp \ -i— =-f L (6.81) 




We see that the solution oscillates very fast, and its numerical modeling may present a challenge for 
even fastest computers. These spurious oscillations (whose frequency, in particular, depends of the 
energy reference level) may be partly tamed by looking for the general solution of Eqs. (79) in a form 
inspired by Eq. (81): 



25 Even if the problem under analysis may be described by the wave-mechanics Schrodinger equation (1.25), a 
direct numerical integration of that partial differential equation is typically less convenient than that of the 
ordinary differential equations (79). 

26 This is of course just a more general form of Eq. (1.61) of wave mechanics of time-independent systems. 
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n | a(iy) = a n (i) exp < - i — ^ t 



(6.82) 



Here a n (t) are some new functions of time (probability amplitudes) that may be used, in particular, to 
calculate the time-dependent level occupancies, i.e. the probabilities W„ to find the perturbed system on 
the corresponding energy levels of the unperturbed system: 



W n (t) = \(n\a(t)f =\a n (tf 



(6.83) 



Plugging Eq. (65) into Eq. (79), for these functions we readily get a slightly modified system of 
equations: 
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where factors co nn ; defined by relation 



hco, = E,, 



(6.84) 



(6.85) 



have the physical sense of frequencies of potential quantum transitions between the n-th and n'-th 
energy levels of the unperturbed system. (The conditions when such transitions indeed take place will be 
discussed later in this chapter.) An advantage of Eq. (84) over Eq. (79) for numerical calculations is the 
absence of any dependence on the energy reference selection, and lower frequencies of oscillations of 
the right hand part terms, especially when the energy levels of interest are close to each other. 

In order to continue our analytical treatment, let us restrict ourselves to a particular but very 
important case of a sinusoidal perturbation turned on at some moment - for example, at t = 0: 



(6.86) 




where the perturbation amplitude operators A and , and hence their matrix elements 

n\A\n')^A nn ,, (n\tf\ri) = A*, n , 



are time-independent. 27 In this case, for t > 0, Eq. (84) yields 

iha n =£a„, 



A_,e v ™ ' +A„,.,e nn ' 



(6.87) 



(6.88) 



This is, generally, still a complex system of coupled differential equations; however, it allows 
simple and explicit solutions in two very important cases. First, let us assume that our system is initially 
in one eigenstate n' (say, on the ground energy level), and that the occupancies W n of all other levels 
stays very low all the time. (We will find the corresponding condition a posteriori - from the solution.) 
With the corresponding assumption 



27 The notation of the amplitude operators in Eq. (86) is justified by the fact that the perturbation Hamiltonian has 
to be self-adjoint (Hermitian), and hence each term in the right-hand part of that relation has to be a Hermitian 
conjugate of its counterpart, which is evidently true only if the amplitude operators are also the Hermitian 
conjugates of each other. Note, however, that each of the amplitude operators is generally not Hermitian. 
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a n r= l» W^l' forn^n' 
Eq. (88) may be readily integrated, giving 
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(6.89) 



(6.90) 



We see that the probability W„ (83) of finding the system on each energy level of the system oscillates in 
time, and that our assumption (89) is satisfied as soon as the excitation amplitude is not too large, 28 



A, J « n\co±co\ 



(6.91) 



Expression (90) also shows that this phenomenon has a clearly resonant character: the maximum 
occupancy W n of a level grows infinitely when the corresponding detuning, 29 



A . = co-co„ 



(6.92) 



tends to zero. In this limit, our initial assumption (89) may become a liability; in order to overcome it we 
may perform the following trick - very similar to the one we used for transfer to the degenerate case in 
Sec. 1. Let us assume that for a certain level n, 



A. 



« co, \co ± co n „ n \,\co± co n „ n , , for all n" ^ n,n' 



(6.93) 



- the condition illustrated in Fig. 10. Then, according to Eq. (90), we may ignore the occupancy of all 
but two levels, n and n', and also the second, non-resonant terms with frequency co nn - + co ~ 2co » \A nn ] 
in Eqs. (88) written for a n and <v. 30 
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Fig. 6.10. Resonant excitation of 
one of the higher energy levels. 



As a result, in this two-level approximation (that is of course not an approximation at all for two- 
level systems), we get a simple system of two linear equations: 



iHd n = a n ,Ae lAt , 
ihd_. =a,A*e +iAt . 



(6.94) 



28 Strictly speaking, another condition is that the number of "resonant" levels is also not too high - see Sec. 6. 

29 The notion of detuning is also very useful in the classical theory of oscillations - see, e.g., CM Chapter 4. 

30 Such omission of non-resonant terms is usually called the Rotating Wave Approximation (RWA); it is very 
instrumental not only in quantum mechanics, but also in the classical theory of oscillations - see, e.g., CM Sees. 
4.3-4.5. 



Chapter 6 



Page 22 of 36 



Essential Graduate Physics 



QM: Quantum Mechanics 



where I have used shorthand notation A = A m > and A = A nn - - and will use it for a while - until other 
energy levels become involved (in the beginning of the next section). This system of linear differential 
equations may be solved exactly by the introduction of a new variable (for one of the levels only!) 



b„ = ae 



■ iAt 



According to this formula, 



; - iAt 
a„ = be , a 
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(6.95) 



(6.96) 



Plugging these relations into Eq. (94), we see that both equations of the system loose their explicit time 
dependence: 



ih(b n - iAb n ) = a n ,A, ihd n , = b n A 



(6.97) 



and now may be readily solved by regular methods. For example, we may differentiate the first 
equation, and then use the second one to eliminate variable a„\ 



ih(b n -iAb n ) = d n ,A 



iti 



A = h 



ih 



(6.98) 



From mathematics we know that the resulting linear, second-order differential equation, with 
time-independent coefficients, has the following general solution, 



b n {t) = b + e X+t +b_e Aj . 



(6.99) 



whose characteristic exponents X may be readily found by plugging any of the exponential functions 
into Eq. (98). In our case, both roots of the resulting characteristic equation, 

I i|2 



A - iAA + ■ 
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(6.100) 



are purely imaginary: X± = /(A/2 + Q), where 
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(6.101) 



Coefficients b+ are determined by initial conditions. If, as before, the system was completely on 
level n' initially, i.e. a„- (0) = 1, a„(0) = b n (0) = 0; then Eq. (99) immediately yields b. = - b+, so that 



b n (t) = 2ib + e iAt/1 sinQf, a n (t) = 2ib + e~ iAt/2 smClt, d n (0) = 2ib + Q 



(6.102) 



Now coefficient b+may be readily found from the comparison of the last equality in Eq. (102) with the 
first of Eqs. (94), taken for t = 0, when a„-= 1. This comparison yields 2ib+D.= Aliti, and hence 



a n (t) = - — e~ iAt/2 smQt, 
tiQ, 



(6.103) 



so that the n th level occupancy is 
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(6-104) Sula 



This is the famous Rabi formula. 31 It shows that an increase of the perturbation amplitude |A| 
leads not only to an increase of the amplitude of the probability oscillations, but also of their frequency 
2Q described by Eq. (101)- see Fig. 1 1 . 




Fig. 6.1 1. Rabi oscillations. 
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Ultimately, at \A\ » h\A\ (for example, at the exact resonance, A = 0) Eqs. (101)-(102) give Q = 
\A\ITi and (W„)max = 1, i.e. describe a periodic, full "repumping" of the system from one level to another 
and back, with a frequency proportional to the perturbation amplitude. This effect gives a very 
convenient tool for manipulating two-level-systems (qubits, in the quantum information context). For 
example, limiting the external excitation time to At = nllQ (or an odd number of such intervals) we may 
completely transfer the system from one eigenstate (say, X) to the opposite one (T). 32 On the Bloch 
sphere (Fig. 5.1), this transfer corresponds to the representing point's drive from the South Pole to the 
North Pole. 

Note, however, that according to Eq. (90), if the system has energy levels other than n and n ', 
they also become occupied to some extent. Since the sum of occupancies should be 1, this means that 
(W,,)max may approach 1 only if the excitation amplitude is very small, and hence the state switching 
time At = TtllQ = 7th/2\A\ is very long. The ultimate limit in this sense is provided by the harmonic 
oscillator where all energy levels are equidistant, and probability repumping between all of them occurs 
with the same rate. Hence, in that particular system, the implementation of the full Rabi oscillations is 
impossible even at the exact resonance. 33 In the opposite limit, when the detuning is large in comparison 
with \A\lh, though still small in the sense of Eq. (93), the frequency of Rabi oscillations is completely 
determined by the detuning, and their amplitude is small: 



31 It was derived in 1952 by I. Rabi, in the context of his group's pioneering experiments with microwave 
excitation of quantum states, using molecular beams in vacuum. 

32 In the quantum information science language, this is just a logic operation NOT performed on a single qubit. 

33 We, of course, already know what happens to the ground state of an oscillator at its external sinusoidal (or any 
other) excitation: it turns into the Glauber state, i.e. a superposition of all Fock states - see Sec. 5.5. 
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for A «(M) 5 



(6.105) 



However, I would not like these quantitative details to obscure from the reader the most 
important qualitative (OK, maybe semi-quantitative :-) conclusion of this section's analysis: the 
resonant increase of interlevel transition intensity at co — > a> nn >. Using the fundamental Kramer-Kronig 
dispersion relations,^ based essentially only on very general causality arguments, it is easy to show 
(and hence left for reader's exercise) that in a medium incorporating many similar quantum systems 
(e.g., atoms or molecules), this increase of quantum transitions is accompanied by a sharp increase of 
external field's absorption. This effect has innumerous practical applications including systems based 
on the electron paramagnetic resonance (EPR) and nuclear magnetic resonance (NMR) spectroscopies, 
which are broadly used in material science, chemistry, and medicine. Unfortunately, I will not have time 
to discuss the related technical issues (in particular, interesting pulsing spectroscopy techniques) in 
detail, and have to refer the reader to special literature. 35 



6.6. Quantum-mechanical Golden Rule 

The last result of the past section, Eq. (105), may be used to derive one of the most important 
results of quantum mechanics - its so-called Golden Rule. For that, let us consider the case when the 
perturbation causes quantum transitions from a discrete energy level E n - into a group of eigenstates E n 
with a dense (virtually continuous) spectrum - see Fig. 12a. If, for all states n of the group, the 
following conditions are satisfied 



|Am'f « {^Kn'f «{h(O m ) 2 



(6.106) 



then Eq. (105) coincides with the result that would follow from Eq. (90). This means that we may apply 
Eq. (105), with indices n and n' duly restored, to any level n of our tight group. As a result, the total 
probability of having our system transferred from level n' to that group is 



n „ A„, 



-sin 



(6.107) 



(a) 



hco 




(b) 



Fig. 6.12. Deriving the Golden 
Rule: (a) the energy level 
scheme, and (b) the function 
under integral (108). 



34 See, e.g., EM Sec. 7.3, in particular, the correspondence between Eqs. (7.55) and (7.56). 

35 For introductions see, e.g., J. Wertz and J. Bolton, Electron Spin Resonance, 2 nd ed., Wiley, 2007; J. Keeler, 
Understanding NMR Spectroscopy, 2 nd ed., Wiley, 2010. 
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Now comes the main, absolutely beautiful trick: let us assume that the summation over n will be 
limited to a tight group of very similar states for which the matrix elements A im ' are virtually similar (we 
will check the validity of this assumption later on), so that we can take it out of the sum (107) and then 
replace the sum with the corresponding integral: 



4A„ 



■f- 



-sin 



, A ,t 



MJ pj f i 



n 



-sin 



A.t 



d(-A m ,t), 



where p n is the density of eigenstates n on the energy axis: 



Pn = 



dn 
dE„ 



(6.108) 



/(■ 1fl0 \ Densit y 

(6.109) of states 



This density, as well as the matrix element A m >, have to be evaluated at A„„- = 0, i.e. at energy E„ = £'„> + 
hco, and are assumed to be constant within the finite state group. At fixed E n >, the function under integral 
(108) is even and decreases fast at \A nn -t\ » 1 - see Fig. 12b. Hence we may introduce a dimensionless 
integration variable i; = A nn -t, and extend integration over this variable formally from -oo to +oo. Then 
Eq. (108) is reduced to a table integral, 36 and yields 



4 A. 



sin 2 ^J^: 
2 



4M PJ n 

h 2 



Tt 



where constant 



T = -r\Kn\ Pn 

n 



(6.110) 



(6.111) 



Golden 
Rule 

of quantum 
mechanics 



is the called the transition rate. 37 

This is one of the most famous and useful results of quantum mechanics, its Golden Rule 
(sometimes, rather unfairly, called the "Fermi Golden Rule" 38 ), which deserves much discussion. First 
of all, let us reproduce the reasoning already used in Sec. 2.5 to show that the meaning of rate T is much 
deeper than Eq. (110) seems to imply. Indeed, due to the conservation of the total probability, W n - + Wx 
= 1 , we can rewrite that equation as 



W, 



«' 1=0 



-r. 



(6.112) 



Evidently, this result cannot be true for t — > oo, otherwise probability W n - would become negative. The 
reason for that apparent contradiction is that result (110) was obtained in the assumption that initially 
the system was completely on level n': W n (0) = 1. Now, if in the initial moment the value of W n - is 



36 See, e.g.,MAEq. (6.12). 

37 In some texts, the density of states in Eq. (Ill) is replaced with expression ~L S(E n - E„- - Ha)). Indeed, the 
integration of this expression over any finite energy interval AE n gives the same result An = (dn/dE„)AE„ = p„AE„ 
as Eq. (111). Such replacement may be useful in some cases, but should be used with utmost care, and for most 
applications the more explicit form (1 1 1) is preferable. 

38 Actually, this result was developed mostly by the same P. A. M. Dirac in 1927; E. Fermi's role was not much 
more than advertising it, under the name of "Golden Rule No. 2", in his lecture notes on nuclear physics, which 
were published much later, in 1950. (To be fair to Fermi, he has never tried to pose as the Golden Rule's author.) 
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Initial 
state's 
occupancy 
decay 



different, result (110) has to be multiplied by that number, due to the linear relation (88) between da„ldt 
and a n \ Hence, instead of Eq. (1 12) we get a differential equation similar to Eq. (2.159), 

W n ,=-TW n ,, (6.113) 
which, for time-independent L, has the evident solution, 

(6.114) 



W n ,(t) = W n ,(0)e- Tt , 



describing an exponential decay of the initial state's occupancy, with time constant r = 1/r. 

I would ask the reader to think again about this fascinating mathematical result: by summation of 
periodic oscillations (105) over many levels n, we have got an exponential evolution (114) of the 
probability. The main trick here is of course that the effective range AE of states E n , giving the 
dominating contribution into integral (108), shrinks with time: AE n ~ M. 39 By the way, since most of the 
decay takes place at times t ~ z = 1/r, the range of participating final energies may be estimated as 

AE n ~- = hT. (6.115) 
r 

This estimate is very instrumental for the formulation of conditions of validity of the Golden Rule (111). 
First, we have assumed that the matrix elements of the perturbation and the density of states do not 
depend on energy within interval (115). This gives the following requirement 

AE n ~hT«E n -E n ,~h(0, (6.116) 

Second, for the transfer from sum (107) to integral (108), we need the number of states within that 
energy interval, AN n = p n AE„, to be much larger than 1. Merging Eq. (116) with Eq. (93) for all energy 
levels n" n, n' not participating in the resonant transfer, we may summarize all conditions of the 
Golden Rule validity as 



Golden 
Rule's 
validity 



p n 1 « hT « h co ± co n , n . 



(6.117) 



(The reader may ask whether I have forgotten the condition expressed by the first of Eqs. (106). 
However, for A nn - ~ AEJti ~ L, this condition is just \A nn ] 2 « (hT) 2 , so that plugging it into Eq. (1 1 1), 

r«^{nr) 2 Pn , (6.118) 

n 

and canceling one L and one h, we see that this requirement coincides with the left relation in Eq. (117) 
above.) 

Let us have a look at whether these conditions may be satisfied in practice, at least in some 
cases. For example, let us consider the optical ionization of an atom, with the released electron confined 
in a volume of the order of 1 cm 3 = 10" 6 m 3 . According to Eq. (1.82), with E of the order of the atomic 
ionization energy E„ - E m = %co ~ 1 eV, the density of electron states in that volume is of the order of 
10 7 1/eV. Thus conditions (117) provide an approximately 15-orders-of magnitude range for acceptable 



39 Here we have run again, in a more general context, into the "energy-time uncertainty relation" which was 
already discussed in the end of Sec. 2.5. Let me advise the reader to revisit that important discussion. 
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values of HT. This illustration should give the reader a taste of why the Golden Rules is applicable to so 
many situations. 

Finally, the physical picture of initial state's decay (which will also be the key for our discussion 
of quantum mechanics of "open" systems in the next chapter) is also very important. According to Eq. 
(114), the external excitation transfers the system onto the continuous spectrum of levels n, and it never 
comes back on the initial level n'. However, it was derived from quantum mechanics of Hamiltonian 
systems, whose equations are invariant with respect to time reversal. This paradox is a result of the 
generalization (113) of the exact result (112), that breaks the time reversal symmetry, but is absolutely 
adequate for the physics under study. Some gut feeling of the physical sense of this irreversibility may 
be obtained from the following observation. From our wave-mechanics experience, we know that the 
distance between adjacent orbital energy levels tends to zero only if the system size goes to infinity. 
This means that the assumption of continuous energy spectrum of finite states n essentially requires 
these states to be infinitely extended in space - essentially being free de Broglie waves. The Golden 
Rule approach corresponds to the (physically justified) assumption that in an infinitely large system the 
traveling waves excited by a local source and propagating outward from it, would never come back, and 
even if they do, the unpredictable phase shifts introduced by the uncontrollable perturbations on their 
way would never allow them to sum up in the way necessary to bring the system back into the initial 
state n'. 40 

Maybe the best illustration of this interpretation is given by the following problem - which is a 
toy model of the photoelectric effect that was briefly discussed in Sec. 1.1 (iii). A ID particle is initially 
trapped in the ground state of a narrow quantum well, 

U(x) = -WS(x). (6.119) 

Let us use the Golden Rule to find rate F of particle's "ionization" (i.e. its excitation into an extended, 
delocalized state) by a weak classical sinusoidal force of amplitude F 0 and frequency a>. As a reminder, 
finding the initial, localized state (n') of such particle was the task of Problem 2.9, and its solution was 

1/2 f ill mW ^ % 2 k 2 mW 2 

Wn ,(x) = K V2 Q xp{-K\x , k^—^, E n , =- —— = -——. (6.120) 

% 2m 2% 

Extended states n with continuous spectrum, for this problem exist only at energies E n > 0, so that the 
excitation rate is different from zero only for frequencies 

\E ,1 mW 2 

co>( o=L^i = rf ^. (6 .121) 
fi 2/z 3 

The weak sinusoidal force may be described by the following perturbation Hamiltonian, 

H m = -F(t)x = -F Q x cos at = -^-x[e iat +e~ io}t for t > 0 , (6.122) 

so that according to Eq. (86), that serves as the amplitude operator definition, in this case 



40 This situation is very much similar to the entropy increase in macroscopic systems, which is postulated in 
thermodynamics, and justified in statistical physics, even though it is based on time -reversible laws of mechanics 
- see, e.g., SM Sec. 1.2 and Sec. 2.2. 
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2 



(6.123) 



Now the matrix elements A„„- that participate in Eq. (Ill) may be calculated in the coordinate 
representation: 



Kn' = \v* n {x)A{x)y/ n \x)dx = -^- \y/* n {x)xy/ n ,{x)dx, 



(6.124) 



Since, according to Eq. (120), the initial y/ n > is a symmetric function of x, a nonvanishing 
contribution to this integral is given only by asymmetric functions y/ n (x), proportional to %mk n x, with 
wavenumber k„ related to the final energy by the well-familiar equality (1.77): 



h 2 k 2 
2m 



(6.125) 



As we know from Sec. 2.5 (see in particular Eq. (2.124) and its discussion), such asymmetric functions, 
with i//„(0) = 0, are not affected by the zero-centered delta-functional potential (119), and their density 
p„ is the same as in a completely free space, and we can use Eq. (1.94). (Actually, since that relation was 
derived for traveling waves, it is more prudent to repeat the calculation that has led to that result, 
confining the waves on an artificial segment [-1/2, +1/2] - so long, 

k n l,id»\, (6.126) 

that it does not affect the initial localized state and the excitation process. Then the confinement 
requirement y/ n {±l/2) = 0 immediately yields the condition k n l/2 = tin, so that Eq. (1.94) is indeed valid, 
but only for positive values of k n , because sink n x with k„ — > -k n does not give an independent standing- 
wave eigenstate.) Hence the finite state density is 



Pn = 



dn 



dn i dE„ 



I _ f h 2 K 



Im 



dE„ dk„ dk,. 2n m 



27ifr 2 k„ 



(6.127) 



It may look troubling that the density of states depends on artificial segment's length /, but the 
same / also participates in the final wavefunction normalization factor, 41 



Wn 



2 



smk n x , 



(6.128) 



and hence the matrix element (124): 

T { 



2 



-i 



■ , -k\x\ , F 0 

smk x e 1 'xdx. = 

2i 



'2k 



,1/2 



(I I 

r (ik—K)x , r -(ik+K)x , 
I e y " ' xdx- \ e v " ' xdx 



Vo 



(6.129) 



These two integrals may be readily worked out by parts. Taking into account that, according to 
condition (126), their upper limits may be extended to x>, the result is 



41 The normalization to infinite volume, using Eq. (5.55), is also possible, but less convenient in such problems. 



Chapter 6 



Page 29 of 36 



Essential Graduate Physics 



QM: Quantum Mechanics 



A„, = 



V 



2k 
I 



.1/2 



2k k 



J 



2x2 



(6.130) 



so that finally we get an expression for the rate, which is independent of the artificially introduced /: 

"l2 



r = 



2n I 
IT 



Pn 



2n 



(2k 



1/2 



2k k 



2x2 



Im 



SF 0 z mky 



2x4 



2nh z k n h\k z +K z ) 



(6.131) 



Note that due to the above definitions of k n and k, the expression in parentheses in the 
denominator of the last formula does not depend on the quantum well parameter W, and is a function of 
only the excitation frequency co (and particle's mass): 



fl\k 2 n +K 2 ) 

2m 

As a result, Eq. (131) may be recast simply as 



fico. (6.132) 



F 0 2 W ! k„ 



hr= i^f- <6J33) 

What is still hidden here is that at fixed E n ; k„ is a function of frequency, changing as co at co 
» co, (so that T drops as co 112 at co — > oo), and as (co - co t ) 112 as co approaches the "red boundary" co t of 

1 /? 

the ionization effect, so that r <x (co - co t ) — > 0 in that limit as well. We see that this toy model does 
describe the main feature of the photoelectric effect, whose explanation by Einstein was essentially the 
starting point of quantum mechanics - see Sec. 1.1. 



6.1. Golden Rule for step-like perturbations 



Now let us reuse some of our results for a perturbation being turned on at t = 0, but after that 
time-independent: 




(6 134^ step "'l ke . 

v<j.±~>t7 perturbation 



A superficial comparison of this equation and our former Eq. (69) seems to indicate that we may use all 
our previous results, taking co = 0. However, that conclusion does not take into account the fact that 
analyzing both the two-level approximation and the Golden Rule for continuous spectrum, we have 
neglected the second (non-resonant) term in Eq. (90). This why it is more prudent to use the general Eq. 
(86), 



ifid n = ^a n ,H (l) m <e 



16}„J 



(6.135) 



in which the matrix element of the perturbation is now time-independent. We see that it is formally 
equivalent to Eq. (88) with only the first (resonant) term kept, if we make the following replacements: 



H 



in 



A. 



co -co,, 



-co„ 



(6.136) 
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As a sanity check, let us revisit a two-level system such as two quantum wells coupled by 
tunneling - see Fig. 13a. It is convenient to include the energy difference E n - E n - between the two levels 
into the unperturbed Hamiltonian, so that perturbation (134) describes only the localized state coupling 
due to tunneling through the energy barrier separating the wells. (The turning on of the coupling, 
described by Eq. (134), may be achieved, for example, by a rapid lowering of the barrier at t = 0.) Then, 
after replacements (136), we are getting an analog of Eq. (104): 



W. = a\ = 



H 



(i) 



/z 2 Q 2 



■sin 2 D.t , 



(6.137) 



where frequency Q of the periodic "probability repumping" between levels n ' and n is now described, 
instead of Eq. (104), by relation 



2Q = 



«i. + 4J 



\H 



(i) 



,2 \ 



1/2 



h 1 



= \[{E n -E n ,Y + AHX} n 



(6.138) 



But these are exactly the quantum oscillations that have already been discussed in Sec. 2.6 - now 
derived for an arbitrary quantum wells and tunnel barrier shape. 




Fig. 6.13. Quantum-well implementation of coupling of a discrete-energy state n' to (a) another 
discrete-energy state, and (b) a state continuum, due to tunneling through a potential barrier. 



The similarity of Eqs. (104) and (137) shows that the Rabi oscillations and the "usual" quantum 
oscillations have essentially the same physical nature, besides that in the former case the external rf 
signal quantum fico bridges over the state energy difference. We may also compare result (138) with our 
analysis of a two-level system, with a similar time-independent perturbation, in Sec. 1. According to Eq. 
(29), its eigenenergies differ by 

E + -E = [(H u -H 22 ) 2 +4H l2 H 2l ]' 2 . (6.139) 

But this is exactly the result given by Eq. (138), provided that we consider (H n - H 2 i) as the difference 
(E n - E n ) of unperturbed state energies rather than as a perturbation, as we certainly have a right to do. 

Now let us consider the effect of perturbation (134) in the case when it creates coupling between 
the initial (discrete) energy level and a dense group of states with a quasi-continuum spectrum, in the 
same energy range. Figure 13b shows an example of such a system: a quantum well separated by a 
penetrable tunnel barrier from an extended region with a quasi-continuous energy spectrum. Making 
replacements (136) in Eq. (11 1), we may present the Golden Rule for this case as 
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n 1 



Pn: 



(6.140) 



where states n and n ' now have the same energy. 42 



It is very informative to compare this result with Eq. (138) for a symmetric (E„ = E n -) double 
quantum well using the same tunnel barrier - see Fig. 13. For the latter case, Eq. (138) yields 



(6.141) 



Here I have used index "con" (from "confinement") to emphasize that this matrix element is rather 
different from the one participating in Eq. (140). Indeed, in the latter case, the matrix element, 



H 



(i) 



= (n\H m \n , ) = \ylH m ¥n dx., 



(6.142) 



has to be calculated for two similar wavefunctions y/ n and y/ n - confined to spatial intervals of the same 
scale / con , while in Eq. (140), wavefunctions y/ n are extended to a much larger distance / » / con - see 
Fig. 13. As Eq. (129) tells us, in the ID model we are considering now, this means an additional factor 

1/2 

small factor of the order of {l C0 Jt) ■ Now using Eq. (128) as a crude but suitable model for the finite- 
state wavefunctions, we arrive at the following estimate: 



fiT ~ 2n 



H 



(i) 



1 



Pn 



In 



H 



(i) 



Im 



Tj(l) 

H nn' 


2 




con 



con / 2nh k. 



(mf 



(6.143) 



where AE n - ~ h /ml con is the scale of the differences between eigenenergies of the particle in an 
unperturbed quantum well. Since the condition of validity of the perturbative formula (140) is HQ. « 
AE n ; we see that 43 

HQ 



nr 



-HQ. « tel. 



(6.144) 



Hence the rate of (irreversible) quantum tunneling into continuum is always much lower that the 
frequency of (reversible) quantum oscillations between states separated with the same potential barrier - 
at least for the case when both are much lower than AE„-/h, so that the perturbation theory is valid. A 
handwaving interpretation of this result is that the confined particle wonders beyond the barrier and 
back many times before finally "deciding" to perform an irreversible transition into unconfined 
continuum. 44 

Let me conclude this section (and this chapter) with the application of Eq. (140) to an important 
case, which will provide us with a smooth transition to the next chapter's topics. Consider a composite 
system consisting of two parts, a and b, with the energy spectra sketched in Fig. 14. 



42 The condition of its validity is again given by Eq. (117), but with co — > 0 in the upper limit. 

43 It is straightforward to show that in this form, the estimate is valid for a similar problem of any spatial 
dimensionality, not just the ID case we have analyzed. 

44 This qualitative picture may be verified, for example, using the experimentally observable effects of dispersive 
electromagnetic environment on electron tunneling - see P. Delsing et ah, Phys. Rev. Lett. 63, 1 180 (1989). 
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system a 



system b 



interaction 
ha> < ► 

H (l) = A(a)B(b) 



hco 



Fig. 6.14. Energy relaxation in 
system a due to its coupling with 
system b (which serves as the 
environment of a). 



Let the systems be completely independent initially. The independence means that in the absence 
of perturbation, the total Hamiltonian of the system at t < 0 may be presented as a sum 



H m =H a (a) + H b (b), 



(6.145) 



where arguments a and b symbolize the non-overlapping sets of variables of the two systems. Then 
eigenkets of the system may be naturally factored as 45 



\n) = \n a )®\n b ), 

while its eigenenergies separate into a sum, just as the Hamiltonian (145) does: 

^ (0 W = (^+4)^)®|^) = (^h fl >)®|^>+(^|^))®|^> 

= (E na \n a ))®\n b ) + (E nh \n b ))®\n a ) = {E na + E nh )| n) . 



(6.146) 



(6.147) 



Analysis of such a composite system is much easier when the interaction of its components may 
be presented as a product of two Hermitian operators, each depending only on the degrees of freedom of 
only one component system: 



H (V) =A(a)B(b) 



(6.148) 



A typical example of such a bilinear interaction Hamiltonian is the electric-dipole interaction between 
an atomic-scale electron system (with a size of the order of the Bohr radius r B ~ 10" 10 m) and the 
electromagnetic field at optical frequencies <x> ~ 10 16 s" 1 , with wavelength X = Inclco- 10" 6 m » r B : 46 

# (1) =-d-i, with d = £g t f t , (6.149) 

k 

where the dipole electric moment d depends only on positions r k of charged particles (numbered with 
index k), while that of electric field & is a function of only the electromagnetic field's degrees of 
freedom - see Chapter 9 below. 

Returning to the general situation shown in Fig. 14, if the component system a was initially in an 
excited state n' a , interaction (148) may bring it to another discrete state n a of a lower energy - for 



45 Sign ® is used to denote the formation of a joint ket-vector from kets of independent systems ("belonging to 
different Hilbert spaces"). Evidently, the order of operands in such a "product" may be changed at will. 

46 See, e.g., EM Sec. 3.1, in particular Eq. (3.16), in which letter p is used for the electric dipole moment. 
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example, the ground state. In the process of this transition, the released energy, in the form of energy 
quantum 

ha> = E n<a -E m , (6.150) 

is picked up by system b: 

E nb -E n% =hco. (6.151) 

(In typical applications, though not always, the initial state n\ of that system is its ground state.) If the 
finite state of the system is inside a state group with quasi-continuous energy spectrum (Fig. 14), the 
process has the exponential character (114) 47 and may be interpreted as the effect of energy relaxation 
of system a, with the released energy quantum ha> absorbed by system b. Note that since the quasi- 
continuous spectrum essentially requires a system of large spatial size, such model is very convenient 
for description of the environment of system a. (In physics, the "environment" typically means all the 
Universe less the system under consideration.) 

The relaxation rate T may be described by the Golden Rule. Since perturbation (148) does not 
depend on time explicitly, and the total energy of the composite system does not change, we may use 
Eq. (140) that, with the account of Eqs. (146) and (148), takes the form 



J^jl , ,2| |2 / i ^ i \ i i * i \ 

r = ^-K„'| \B n „] P„, where A w = {n a \A\n' a ), B m , = {n b \B\n\), 



with p n being the density of states of the finite states of system b, at the relevant energy E,,b = E n -b + hco 
= E n -b + {E n - a - E na ). In particular, Eq. (152), with the dipole Hamiltonian (149), enables a very simple 
calculation of the natural linewidth of atomic electric dipole transitions. However, such calculation has 
to be postponed until Chapter 9 in which we will discuss the electromagnetic field quantization - i.e., the 
exact nature of states nb and n for this problem. Instead, I will proceed to a discussion of the effects of 
interaction of quantum systems with their environment, toward which the situation shown in Fig. 14 
provides a clear path. 



Golden 



(6.152) f Rule . . 
v ' for coupled 

systems 



6.8. Exercise problems 
6.1 . Use Eq. (13) to prove the so-called Hellmann-Feynman theorem 48 

dE„ , .dH, 



8X N 1 8X 1 

where X is an arbitrary onumber parameter, and use this theorem to prove the first of Eqs. (3.201). 

6.2 . Use the 1 st order of the perturbation theory to calculate the ground-state energy of an 
anharmonic 3D oscillator with 



47 The process is evidently spontaneous, i.e. does not require any external agent, and starts as soon as either the 
interaction (127) has been turned on, or (if it is always on) as soon as system a is placed into the excited state n' a . 

48 After H. Hellman, who had published this result in 1937, and R. Feynman who re-discovered it in 1939. This 
formula is very convenient for some applications, for example, for calculation of intermolecular forces (taking the 
molecule spacing for X). 
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6.3 . A 2D quantum particle is confined in a square-shaped quantum well with infinitely high 
walls and slightly skewed floor: 



U = 



\juxy, for 0 < x < L and 0 < y < L, 
I + oo, otherwise. 



In the first order in the small parameter ju, find energies of the ground state and the lowest excited state 
of the system. Formulate the conditions of validity of your result. 

Hint: To save reader's time on a straightforward but longish integration by parts, I can offer the 
following integral: 



J sin(^) sin(2;r£) £ dS, = - 



8 



9tt' 



6.4 . Prove that the relativistic correction operator (50) indeed has only diagonal matrix elements 
in the basis of unperturbed Bohr atom states (3.190). 



6.5 . Use the perturbation theory to evaluate the magnetic susceptibility of a dilute media, due to 
the orbital motion of electrons. 49 



6.6 . In a certain quantum system, distances between three lowest £ 2 — ^- 



hco 2 = h(a> l +g) 



hco x 



energy levels are slightly different - see Fig. on the right (|^| « co\^). Find the 
time necessary to populate the first excited level almost completely (with a 
given precision s « 1), using the Rabi oscillation effect, if at t = 0 the system ^ 
is completely in its ground state. 

Hint: Assume that all matrix elements of the perturbation Hamiltonian 
are known, and are all proportional to the external rf field amplitude. 



6.7 . Use the single-particle approximation to find the complex dielectric constant s((o) of a dilute 
gas of similar atoms, due to their induced electric polarization by a weak external ac field, for a field 
frequency co very close to one of quantum transition frequencies co nn - defined by Eq. (6.85) of the lecture 
notes. 

Hint: In the single-particle approximation, the atom is treated as a set of non-interacting 
electrons moving in an effective static potential of the nuclei and other electrons. 



49 For the susceptibility definition and its classical calculation see, e.g., EM Sec. 5.5. 
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6.8 . Use the solution of the previous problem to generalize the expression for the London 
dispersion force between two electroneutral molecules (calculated in Problems 3.7 and 5.9 for the 
harmonic oscillator model) to the single-particle model with arbitrary excitation energy spectrum. 

6.9 . Find the rate of ionization of a hydrogen atom, initially in its ground state, by a classical, 
linearly polarized electromagnetic wave with the electric field amplitude <^o, and frequency a> within the 
range 

h c 
-«oj« — , 

where tb is the Bohr radius. Recast your result in terms of the cross-section of this electromagnetic wave 
absorption process. Discuss semi-quantitatively what changes would be necessary in the theory if either 
of the above conditions had been violated. 

6.10 . For the system of two weakly coupled quantum wells (see Fig. 13a), write the system of 
differential equations for complex amplitudes a„, defined by Eq. (2.201), and in particular prove Eqs. 
(2.201) - which were just guessed in Chapter 2. 

6.11 . Use the Golden Rule to derive the general expression for the electric current / through a 
weak tunnel junction between two conductors, biased with dc voltage V. 

Hints: 

(i) The electric current flowing through a typical tunnel junction is so low that its perturbation of 
the electron states inside each conductor is negligible. 

(ii) A very reasonable description of most conductors may be achieved by treating the set of their 
conduction electrons as a Fermi gas, in which the electron-electron interaction is limited to Pauli's 
exclusion principle - see Sec. 3.7. 

6.12 . Generalize the result of Problem 1 1 to the case when a weak tunnel junction is biased with 
voltage V(t) = V + A cos cat . 

6.13 . Use the Golden Rule to derive the Landau-Zener formula (2.266). 
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Chapter 7. Open Quantum Systems 

This chapter discusses the effects of interaction of a quantum system with its environment, and in 
particular, with the instruments used for measurements. Some part of this material is on the fine line 
between quantum mechanics and (quantum) statistical physics. Here I will only cover its aspects that 
are of key importance for the basic goals of this course. 1 

7.1. Open systems and the density matrix 

All the way until the very end of the previous chapter, we have discussed quantum systems 
isolated from their environment. Indeed, from the very beginning we have assumed that we are dealing 
with the statistical ensembles of systems as similar to each other as only allowed by laws of quantum 
mechanics. Each member of such an ensemble, called pure or coherent, may be described by the same 
quantum state a - in the wave mechanics case, by the same wavefunction W a . Even our discussion of the 
Golden Rule in the end of the last chapter, in particular its form in which one component system (in Fig. 
6.13, system b) may be used as a model of the environment of another component (a), was still based on 
the assumption of a pure initial state (6.146) of the system. Since the interaction of two component 
systems was described by a certain Hamiltonian (the one given by Eq. (6.145) for example), for the state 
a of the system as a whole at arbitrary instant we might write 

|«) = Z a «W = IXk)®h)' c 7 - 1 ) 

n n 

with a unique correspondence between eigenstates states n a and m. 

However, in many important cases our knowledge of quantum system's state is incomplete. This 
is especially unavoidable 2 when a relatively simple quantum system s of our interest (say, an electron or 
an atom) is in a contact with environment e - here understood in a most general sense, say, as all the 
whole Universe less system s - see Fig. 1. Then there is virtually no chance of making two or more 
experiments with exactly the same composite system, because it would imply a repeated preparation of 
the whole environment (including the experimenter :-) in a certain quantum state - a rather challenging 
task, to put it mildly. In this case, it makes much more sense to consider a statistical ensemble of another 
kind, with random quantum states of the environment, though possibly with known macroscopic 
parameters (e.g., temperature, pressure, etc.). 

In classical physics, such mixed ensembles are the subject of statistical (classical) mechanics. 3 
Let us see how they may be described in quantum mechanics. For the beginning, we need to assume 
again that the coupling between the system of interest and its environment is weak in the sense accepted 



1 For a broader discussion of statistical mechanics and physical kinetics, including those of quantum systems, the 
reader is referred to the SM part of this lecture note series. 

2 Most of the mixed ensemble analysis in this chapter will pertain also to the cases when the systems of interest 
are not in a contact with the environment currently, and our knowledge about them is incomplete by some other 
reason - for example, if they had been in such a contact at some time between their perfect preparation (in a 
certain quantum state) and the observation, or if such a perfect preparation is impossible (or impracticable, or 
undesirable :-). 

3 See, e.g., SM Sec. 2.1. 
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in the perturbation theory. 4 In this case we can still use the bra- and ket-vectors of unperturbed states, 
that depend on different sets of variables (again, "belonging to different Hilbert spaces"). Then the most 
general quantum state of the whole Universe, still assumed to be pure, 5 may be described as the 
following linear superposition: 



\ a )=H a A s j)®h 

JA 



(7.2) 



Assumed 
quantum 
state of 
Universe 



The "only" difference between the description of such an entangled state and the superposition 
of separable states, described by Eq. (1), is that coefficients at/k in the right-hand part of Eq. (2) are 
numbered with two indices: index j listing the quantum states of system s, and k numbering the 
(enormously large) set of quantum states of the environment. So, in a mixed ensemble a certain state Sj 
of the system of interest may coexist with different states of its environment. 6 Of course, the enormity of 
the Hilbert space of the environment, i.e. the number of ^-components in sum (2), strips us of any 
opportunity to make direct calculations using that sum. For example, according to the basic Eq. (4.125), 
in order to find the expectation value of an arbitrary observable A in state (2), we would need to 
calculate 



A) = (a \A\ a) = £ or* a fk , (e k | <8> (s , \A\ s f )®\e k ,)- (7-3) 



j j 

k.k' 



Even if we assume that {s} and {e} are sets of the basis states of, respectively, the system and the 
environment, and that each is full and orthonormal, Eq. (3) still includes a double sum over the 
enormous basis state set of the environment! 




. Quantum system and its environment 
schematically :-). 



However, let us consider a limited but the most important subset of operators - those of intrinsic 
observables, which depend only on the degrees of freedom of the system of interest (s). These operators 
commute do not act on environment's degrees of freedom, and hence in Eq. (3) we may move the 
environment bra-vector (e*| over all the way to ket-vector \et). Assuming, again, that the set of 
environmental eigenstates is full and orthonormal, Eq. (3) is now reduced to 



4 In the opposite case, the very partition of the Universe into the system and the environment is impossible. 

5 Whether this assumption is true is an interesting issue, still being debated (more by philosophers than by 
physicists), but it is widely believed that its solution is not critical for the validity of the results of this approach. 
In Sec. 6, 1 will offer a strong argument for this opinion - albeit not its strict proof. 

6 Actually, such coexistence has been implied (but well hidden :-) in the derivation of the quantum-mechanical 
Golden Rule, which in all fairness, also belongs to the open systems class. 
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A ) = H a *k a jv( s j \^\ s r)( e k W) = Z4'E¥ii 



(7.4) 



This is already some relief, because we have "only" a single sum over k, but the main trick 7 is 
still ahead. After the summation over k, the second sum in the last form of Eq. (4) is some function w of 
indices j and j ', so that, according to Eq. (4.96), this relation may be presented as 



Expectation 
value of 
intrinsic 

observable 



Density 
matrix: 
definition 



A) = ^A jf w rj =Tv(Aw), 



where matrix w, with elements 



w 



jj 



i.e. w M .=^a Jk a* J1[t 



(7.5) 



(7.6) 



is called the density matrix of the system. Most importantly, Eq. (5) shows that the knowledge of this 
matrix allows the calculation of the expectation value of any intrinsic observable A (and, according to 
Eqs. (1.33)-(1.34), its r.m.s. fluctuation as well if necessary), even for the very general statistical 
ensemble of states (2). This is why let us have a very good look at the density matrix. 

First of all, as we know very well by now that the expansion coefficients in superpositions of the 
type (2) may be always expressed as bra-kets; in our current case, we may write 



a jk = 



e k \<S>( Sj \\a 



(7.7) 



Plugging this expression into Eq. (6), we get 



Statistical 
operator: 
definition 



w ■ 



^(e k \a)(a\e k 



V k 



® \ S f 



Sj WS f 



(7.8) 



We see that from the point of our system (i.e. in its Hilbert space whose basis states may be numbered 
by indices j only), the density matrix is indeed just the matrix of some construct, 8 



w 



= Z( e *l a )( a l e * 



(7.9) 



that is called the statistical (or "density") operator. As evident from its definition (9), in contrast to the 
density matrix this operator does not depend on the choice of a particular basis Sj - just as all previous 
operators considered in this course, but in contrast to them, the statistical operator does depend on 
composite system's state a, including the state of system s as well. However, in the y'-space it is 
mathematically still an operator whose matrix elements obey all formulas of the bra4iet formalism. 

In particular, due to its definition (6), the density operator is Hermitian: 



w. 



w. 



(7.10) 



7 First suggested in 1927 by J. von Neumann. 

8 Of course the "bra-kets" in this expression are not c-numbers, because state a is defined in a larger Hilbert space 
(of the environment plus the system of interest) than the basis states e\ (of the environment only). 
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so that according to the general analysis of Sec. 4.3, there should be a certain basis {w} in which the 
matrix of this operator is diagonal: 



W ■ = W S . 

JJ |mvr J jf 



Since any operator, in any basis may be presented in form (4.59), in basis {w} we may write 



w 



(7.11) 



Statistical 
(7 12) operator in 



diagonalizing 
basis 



This expression reminds, but is not equivalent to Eq. (4.44) for the identity operator, that has been used 
so many times in this course, and in the basis Wj has the form 



W. (W. 



(7.13) 



In order to comprehend the meaning of coefficients Wj participating in Eq. (12), let us use Eq. (5) 
to calculate the expectation value of any observable A whose eigenstates coincide with those of the 
special basis set {w} : 

Expectation 
(7 14) value of 



^) = Tr(Aw) = XVV^=I> 



j w j 



^-compatible 
variable 



where Aj is just the expectation value of observable A in state Wj. Hence, in order to comply with the 
general Eq. (1.37), real c-numbers Wj must have the physical sense of probabilities Wj of finding the 
system in state j. As the result, we can rewrite Eq. (12) in the form 



w = Y\yv j )w j (w ] 



(7.15) 



In one ultimate case when only one of probabilities (say, Wj') is different from zero, 



w J =s ]n 



(7.16) 



the system is evidently in a coherent (pure) state wy. Indeed, it is fully described by one ket-vector \wy), 
and we can use the general rule (4.86) to present it in another (arbitrary) basis {s} as a coherent 
superposition 



ZKM=2W|',>. 



(7.17) 



where U is the unitary matrix of transform from basis {w} to basis {s}. According to Eqs. (1 1) and (16), 
in such a pure state the density matrix is diagonal in the {w} basis, 



w M'\^ S j,j" S r,f" 

but not in an arbitrary basis. Indeed, using the general rule (4.92), we get 



YutwJ- u,,, =ui,u.„., = u".u ,,, 

ji " I in w h jj j j jj jj 



t 



(7.18a) 



(7.18b) 



/./■ 



To make this result more transparent, let us denote matrix elements Uj-j = (wj-\sj) (that, for fixed 
j", depend on just one index j) by then 
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Density 
matrix 
of a pure 
system 



Gibbs 
distribution 



JJ I m s 



a, a,,. 

j j ■ 



(7.19) 



so that N elements of the whole NxN matrix is determined by just one string of N c-numbers For 
example, for a two-level system (N= 2), 



a l a l 



CC j CC ^ 



CC C£j 



CC CC ^ y 



(7.20) 



We see that the off-diagonal terms are, colloquially, "as large as the diagonal ones", in the following 
sense: 

w n w 21 =w u w 22 . (7.21) 

Since the diagonal terms have the sense of probabilities W\ t i to find the system in the corresponding 
state, we may present Eq. (20) in the form 



w = 



(W l W 2 ) U2 e l(p 



1/2 -1<P 



(7.22) 



The physical sense of the (real) constant <p is the phase shift between the coefficients in the linear 
superposition (17) that presents the pure state wy in basis s 1,2. 

Now let us consider a different statistical ensemble of two-level systems, that includes member 
states identical in all aspects (including similar probabilities W\z in the same basis £1,2), besides that the 
phase shifts cp are random, with the phase probability uniformly distributed over the trigonometric circle. 
Then the ensemble averaging is equivalent to averaging over <p from 0 to In, so that it kills the off- 
diagonal terms of the density matrix (22), and the matrix becomes diagonal. For a system with a time- 
independent Hamiltonian, such averaging is especially plausible in the basis of stationary states n of the 
system, in which phase cp is just the difference of integration constants in Eq. (4.158), and randomness is 
naturally produced by minor fluctuations of the energy difference E\ - E 2 . (In Sec. 3 we will study the 
dynamics of such dephasing process.) The mixed statistical ensemble of systems with the density matrix 
diagonal in the stationary state basis is called the classical mixture, and presents the limit opposite to the 
pure (coherent) state. 

After that example, the reader should not be much shocked by the main claim 9 of statistical 
mechanics that any large ensemble of similar systems in thermodynamic (or "thermal") equilibrium is 
exactly such a classical mixture. Moreover, for systems in the thermal equilibrium with a much larger 
environment with fixed temperature T (such environment is usually called a heat bath or a thermostat) 
statistical physics gives 10 a very simple expression, called the Gibbs distribution, for probabilities W n : 





E„ ' 




W n =^rexp|- 













(7.23a) 



9 This is essentially an alternative formulation of the basic postulate of statistical physics, called the 
microcanonical distribution - see, e.g., SM Sec. 2.2. 

10 See. e.g., SM Sec. 2.4. The Boltzmann constant k B is only needed if the temperature is measured in non-energy 
units, say in kelvins. 
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where E„ is the eigenenergy of the corresponding stationary state, and Z is the normalization coefficient 
called the statistical sum 



(7.23b) 



A detailed analysis of classical and quantum ensembles in thermodynamic equilibrium is the 
focus of statistical physics courses (such as my SM) rather than this course of quantum mechanics. 
However, I would still like to attract reader's attention to the key fact that, in contrast with the similarly- 
looking Boltzmann distribution for single particles, 11 the Gibbs distribution is absolutely general and is 
not limited to classical statistics. In particular, for quantum gases of indistinguishable particles, it is 
absolutely compatible with quantum statistics (such as the Bose-Einstein or Fermi-Dirac distributions) 
of the component particles. For example, if we use Eq. (23) to calculate the average energy of a ID 
harmonic oscillator of frequency «o in thermal equilibrium, we easily get 12 



W. 



Qxp< - n 



ha> n 



k a T 



1 -exp 



Z = exp<^ 



hco 0 
2kJ^ 



1-exp 



n=0 




2k B T 2 Qxp{hcL> 0 /k B T}-\ 



An alternative way to present the last result is to write 

_ hco 0 



E) = ^ L + hco 0 {n), with(«) = > , , 

2 exp{ha> 0 / k B T\-\ 



1 



(7.24) 



(7.25) 



(7.26a) 



(7.26b) 



Harmonic 
oscillator 
in thermal 
equilibrium 



and to interpret it as the fact that in addition to the so-called zero-point energy hcooll of the ground state, 
the oscillator (on the average) has (n) thermally-induced excitations, with energy ficoo each. In the 
harmonic oscillator, whose energy levels are equidistant, such a language is completely appropriate, 
because the transfer from any level to one just above it adds the same amount of energy, hcoo, to the 
system. The above expression for (n) is actually the Bose-Einstein distribution (for the particular case of 
zero chemical potential); 13 we see that it does not only contradict the Gibbs distribution (for the total 
energy of the system), but immediately follows from it. 14 



11 See, e.g., SM Sec. 2.8. 

12 See, e.g., SM Sec. 2.5 - but mind a different energy reference level, E 0 = Tico, used in Eqs. (2.68)-(2.69), 
affecting the expression for Z. Actually, the calculation is so straightforward (just the summation of a geometric 
progression for the enumeration of Z) that it is highly recommended to the reader as a simple exercise. 

13 See, e.g., SM Sec. 2.8. 

14 Because of the fundamental importance of Eq. (26) for many fields of physics, let me remind the reader of its 
main properties. At low temperatures, k B T « tia>Q, there are virtually no thermal excitations, (n) — > 0, and the 
average energy of the oscillator is dominated by that of its ground state. In the opposite limit of high temperatures, 
(n) — > k B T /fta>o» 1, and (E) approaches the classical value k B T (following, for example, from the equipartition 
theorem, which assigns energy k B T/2 to each quadratic contribution to system's energy - in the ID oscillator case, 
to one potential and one kinetic energy term). 
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7.2. Coordinate representation and the Wigner function 

For many applications of the density matrix to wave mechanics, its coordinate representation is 
convenient. (I will only discuss it for ID case; the generalization to multi-dimension case is 
straightforward.) Following Eq. (4.47), it is natural to define the following function of two arguments 
(frequently also called the density matrix): 



Density 
matrix in 
coordinate 
representation 



w(x,x') = (x\w\x' 



(7.27) 



Inserting, into the right-hand part of this definition, two closure conditions (4.44) for an arbitrary (full 
and orthonormal) basis {s}, and then using Eq. (5.19), we get 15 

(x,x') = E(*k)( J J H'/X'/k) = Z'// ! (.v)n- ! ,| i ,,«//*.(.v') . (7.28) 



i,f jJ 



In the special basis {w}, in which the density matrix is diagonal, this expression is reduced to 

v(x,x') = Y J V J (xWy j (x'). (7.29) 



w{ 



Let us discuss the properties of this function. At coinciding arguments, x = x ', this is just the 
probability density: 16 

w(x,x) = X'// ( v)H (//*( v) = 5>,(x)JF, = w(x) . (7.30) 

However, the density matrix gives more information about the system than just the probability density. 
As the simplest example, let us consider a pure quantum state, with Wj = Sjj; so that yAx) = y/j{x), and 

w(x,x') = y/ r {x)y/*,{x') = y/(x)y/*(x r ) . (7.31) 

We see that the density matrix carries the information not only about the modulus but also the phase of 
the wavefunction. (Of course one may argue rather convincingly that in this ultimate limit the density- 
matrix description is redundant, because all this information is contained in the wavefunction itself.) 

How may be the density matrix interpreted? In the simple case (31), we can write 

I 1 2 * * * 

w(x,x) =w(x,x')w {x,x') = y/{x)y/ {x)y/{x')y/ (x') = w(x)w(x') , (7.32) 

so that the modulus squared of the density matrix may is just as the joint probability density to find the 
system at point x and point x '. For example, for a simple wave packet with the spatial extent dx, w(x,x ') 
is appreciable only if the both points are not farther than Sx from the packet center, and hence from each 
other. The interpretation becomes more complex if we deal with an incoherent mixture of several 
wavefunctions, for example the classical mixture describing the thermodynamic equilibrium. In this 
case, we can use Eq. (23) to rewrite Eq. (29) as follows: 



5 For now, I will focus on a fixed time instant (say, t = 0), and hence write yApc) instead of *¥(x, t). 

6 This fact is the historic origin of density matrix' name. 
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W(X,X') = Y^Vn^WnVnix 1 ) = \ Yu V » W eX P) " (Vn ^ 



k B T 



(7.33) 



As the simplest example, let us see what is the density matrix of a free (ID) particle in the 
thermal equilibrium. As we know very well, in this case, the set of energies E p = p 2 /2m of stationary 
states (monochromatic waves) forms a continuum, so that we need to replace sum (33) by an integral, 
taking "delta-normalized" traveling wavefunctions (5.59) as eigenstates: 



w(x,x') ■ 



1 



2nnZ 



M-f} exp | 



2mk R T 



Ht-K 



(7.34) 



This is a usual Gaussian integral, and may be worked out, as we have done repeatedly in Chapter 2 and 
beyond, by complementing the exponent to the full square of momentum plus a constant. The statistical 
sum Z may be also readily calculated, 17 

Z = (2mnk B T) U2 , (7.35) 

However, for what follows it is more useful to write the result for product wZ (the so-called un- 

normalized density matrix): 



w(x,x')Z = 



r mkj} V: 



exp 



mk B T(x- x'Y 



2/T 



(7.36) 



Free 
particle 
in thermal 
equilibrium 



This is a very interesting result: the density matrix depends only on the difference of its 
arguments, dropping to zero fast as the distance between points x and x' exceeds the following 
characteristic scale (called the correlation length) 

_ , 1/9 t, 

(7.37) 

This length may be interpreted in the following way. It is straightforward to use Eq. (23) to verify that 
the average energy E p = p 2 /2m of a particle in the thermal equilibrium, i.e. in the classical mixture (33), 
equals ksT/2 - this is just one more manifestation of the equipartition theorem. Hence the average 
momentum magnitude may be estimated as 




Free 
particle's 
correlation 
length 



1/2 



(2m(E)f 2 =(mk B Tr, 



(7.38) 



so that x c is of the order of the minimal length allowed by the Heisenberg-like "uncertainty relation": 

n 



x„ = ■ 



(7.39) 



17 Due to the delta-normalization of the eigenfunction, the density matrix for the free particle (and any system 
with continuous eigenvalue spectrum) is normalized as 

jw(x,x')Zdx' = ^w(x,x')Zdx = 1. 

—00 — CO 
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Notice that with the growth of temperature, the correlation length (37) goes to zero, and the 
density matrix (36) tends to the ^-function: 



w(x,x')Z\ 



S(x-x') 



(7.40) 



Since in this limit the average kinetic energy of the particle is larger than its potential energy in any 
fixed potential profile, Eq. (40) is the general property of the density matrix (33). 

Let us discuss the following curious feature of Eq. (36): if we replace feT 1 with h/i(t - to), and x' 
with xo, the un-normalized density matrix wZ for a free particle turns into the particle's propagator - see 
Eq. (2.49). This is not just an occasional coincidence. Indeed, in Chapter 2 we saw that the propagator of 
a system with an arbitrary stationary Hamiltonian may be expressed via the stationary eigenfunction as 



G(x, t;x 0 ,t 0 ) = Y J V n (*) ex P j " 1 ~Y ~ h ) \ ¥*„ Oo ) 



(7.41) 



Comparing this expression with Eq. (33), we see that the replacements 



h 



k B T 



(7.42) 



turn the pure-state propagator G into the un-normalized density matrix wZ of the same system in 
thermodynamic equilibrium. This important fact, rooted in the formal similarity of the Gibbs distribution 
(23) with the Schrodinger equation's solution (1.67), enables a theoretical technique of the so-called 
thermodynamic Green 's functions, which is especially productive in condensed matter physics. 18 

For our purposes, we can use Eq. (42) to recycle some of wave mechanics results, in particular 
the following formula for the harmonic oscillator's propagator 



.1/2 



G(x,t',x 0 ,t 0 ) — 



ma>a 



2mftsm[a> Q (t -t 0 )] j 



exp 



ma> 0 [(x 2 + xl )cos[<y 0 (t -t 0 )]- 2xx 0 ]] 



2ihsm[a> Q (t -t 0 )] 



(7.43) 



that may be readily proved to satisfy the Schrodinger equation for Hamiltonian (5.95), with the 
appropriate initial condition, G(x, to', x 0 , to) = d\x - x 0 ). Making substitution (42), we immediately get 



Harmonic 
oscillator 
in thermal 
equilibrium 



,1/2 



w(x,x')Z = 



mco < n 



Itih sinh[#<y 0 / k B T] 



exp 



m» 0 [(x 2 +x' 2 )cosh[7z<y 0 / A^r]-2xx' 



2h sinh[hco 0 1 k B T] 



(7.44) 



As a sanity check, at very low temperatures, k^T « hcoo, both hyperbolic functions, participating in this 
expression, are very large and nearly equal, and Eq. (44) yields 



w(x,x')ZL 0 



ma> 0 
7th 



.1/4 



exp^ 



ma> Q x 



x exp<^ 



ha> 0 
2kJ^ 



mco 0 
7th 



.1/4 



exp 



ma> a x 



. (7.45) 



18 I will have no time to discuss this technique, and have to refer the interested reader to special literature. 
Probably, the most famous text of that field is A. Abrikosov, L. Gor'kov, and I. Dzyaloshinski, Methods of 
Quantum Field Theory in Statistical Physics, Prentice-Hall, 1963. (Later reprintings are available from Dover.) 
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In each of the square brackets we can readily recognize the ground state's wavefunction (2.269), while 
the middle exponent is just the statistical sum (24) in the low-temperature limit when it is dominated by 
the ground-level contribution: 



\T^>0 



exp 



ha> 0 
2kJ^ 



(7.46) 



As a result, Z in both parts of Eq. (45) may be cancelled, and the density matrix in this limit is described 
by Eq. (31), with the ground state as the only state of the system. This is natural when temperature is too 
low for the excitation of any other state. 

Returning to arbitrary temperatures, Eq. (44) in coinciding arguments gives the following 
expression for the probability density: 19 



.1/2 



w(x, x)Z = w(x)Z = 



ma>r. 



exp 



2;zftsinh[#<y 0 / k B T] 
This is just a Gaussian function of x, with the following variance 

h , hco, 



mco n x , flea?. 
- — tanh- 



2kJ 



(7.47) 



2ma> n 



- coth - 



2k B T 



(7.48) 



In order to compare this result with our earlier ones, it is useful to recast it as 



U 



ma>a 



x 



flCO n . tlCD a 

— -coth- 1 



2k B T 



(7.49) 



Comparing this expression with Eq. (26), we see that the average value of potential energy is exactly 
one half of the total energy - the other half being the average kinetic energy. This is what we could 
expect, because according to Eqs. (5.129)-(5.130), such relation holds for each Fock state and hence 
should also hold for their classical mixture. 

Unfortunately, besides the trivial case (30) of coinciding arguments, it is hard to give a 
straightforward interpretation of the density function in terms of system measurements. This is a 
fundamental difficulty that has been well explored in terms of the Wigner function (sometimes called the 
"Wigner-Ville distribution") 20 defined as 



(7.50) 




Wigner 

function: 

definition 



19 I have to confess that this notation is imperfect, because from the point of view of rigorous mathematics, w(x, 
x ') and w{x) are different functions, and so are w(p, p ') and w(p) used below. In the perfect world, I would use 
different letters for them all, but I desperately want to stay with "w" for all the probability densities, and there are 
not so many good different fonts for this letter. Let me hope that the difference between these functions is clear 
from their arguments, and from the context. 

20 It was introduced in 1932 by E. Wigner on the basis of a general (Weyl-Wigner) transform suggested by H. 
Weyl in 1927, and re-derived in 1948 by J. Ville on a different mathematical basis. 
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From the mathematical standpoint, this is just the Fourier expansion of the density matrix in one of two 
new coordinates (Fig. 2) defined by relations 



x = X + 



X 



x' = X- 



X 



(7.51) 



Physically, the new argument X = (x + x ')I2 may be understood as the average position of the 

particle during the time interval (t - t'), while X the distance passed by the particle during 

that time interval, so that P may be interpreted as the characteristic momentum of a particle during that 
motion. As a result, the Wigner function is a construct intended to characterize the system spread 
simultaneously in the coordinate and momentum space - for ID systems, on the phase plane [X, P] that 
we considered before - see Fig. 5.6. Let us see how fruitful these intentions are. 



V2 \ 



77 X42 



Fig. 7.2. Coordinates Xand X employed in the Weyl- 
Wigner transform (50). They differ from the coordinates 
obtained by the rotation of the reference frame by angle 
nil only by coefficients a/2, describing scale stretching. 



First of all, we may write the Fourier transform reciprocal to Eq. (50): 



w 



X „ X] *,„ _ f iPX 



X + -,X--j = \ W(X, P) exp|+ l —\dP . (7.52) 



For the particular case X = 0 , this relation yields 

w(X) = w(X, X) = J W(X, P)dP . (7.53) 

Hence the integral of the Wigner function over momentum P gives the probability density to find the 
system at point X 

Actually, the function has the same property for integration over X. To prove that, we should 
first introduce the momentum representation of the density matrix, in the full analogy with its coordinate 
representation (27): 

w(p,p') = (p\w\p'). (7.54) 

Inserting, as usual, two identity operators, in the form given by Eq. (5.21), into the right hand part of this 
equality, we can get the following relation between the momentum and coordinate representations: 

w(p,p') = (p\m\p') = J ^ dxdx'(p\x)(x\yi\x')(x'\p') = — ^—^dxdx'exp\-^^\w(x,x')exp\ l -^-\ .(7.55) 
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This is of course nothing else than the unitary transform of an operator from the x-basis to />-basis, and is 
similar to the first form of Eq. (5. 67). 21 For coinciding arguments,/? = p' , Eq. (55) is reduced to 

w(p) = w(p,p) = — !— [ f dxdx'w(x,x')expi-^^ — —I. (7.56) 
27th J J [ h J 

Using Eq. (29) and then Eq. (5.60), this function may be presented as 

and hence interpreted as the probability density of the particle's momentum at point p. Now, in variables 
(51), Eq. (56) has the form 

x , * ^ 

w( 



<P) = ^\\J\X + *,X-*\ expj- l ^-\dXdX. (7.58) 



2 2 

v z z J 



2iih 

Comparing this equality with definition (50) of the Wigner function, we see that 

W (P) = j" W(X, P)dX . (7.59) 

Thus, according to Eqs. (53) and (59), the integrals of the Wigner function over either the 
coordinate or momentum give the probability densities to find them at certain values of these variables. 
This is of course the main requirement to any candidate joint probability density, p(X,P), to find a 
classical representation point of a stochastic system on the phase plane [X, P]. 22 

Let us look how does the Wigner function look for the simplest systems in the thermodynamic 
equilibrium. For a free ID particle, we can use Eq. (34), ignoring for simplicity the normalization issues: 



W(X,P) oc Jexp|-^^|exp|-^W. (7.60) 



The usual Gaussian integration yields: 



W(X,P) = const x exp< 



P 2 



I 2mk B T 



Thermal 
, 1 x equilibrium: 
(7.61) 



particle 



We see that the function is independent of X (as it should be for this translational-invariant system), and 
coincides with the Gibbs distribution (23). We could get the same result directly from classical statistics. 
This is natural, because as we know from Sec. 2.2, the free motion is essentially not quantized - at least 
in terms of its energy and momentum. 

Now let us consider a substantially quantum system, the harmonic oscillator. Plugging Eq. (44) 
into Eq. (50), for that system in thermal equilibrium we are also getting a two-dimensional Gaussian 
function 



21 Note that the last line of Eq. (5.67) is invalid for the density operator w , because it is not local! 

22 Such density, which would express the probability dW to find the system in a small area of the phase 
plane as dW= p(X, P)dXdP, is the basic notion of (ID) classical statistics - see, e.g., SM Sec. 2.1. 
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Thermal 
equilibrium: 
harmonic 
oscillator 



W(X,P) = const x exp^ - C 



ma>lX 2 



2m 



(7.62) 



though coefficient C is now different from llk^T , and tends to that limit only at high temperatures, kgT 
» ha>o. Moreover, for the Glauber state it also gives a very plausible result - a Gaussian distribution 
similar to Eq. (62), but shifted to the central point of the state - see Sec. 5. 5. 23 

Unfortunately, for some other possible states of the harmonic oscillator, e.g., any pure Fock state 
with n > 0, the Wigner function takes negative values in some regions of the [X, P] plane - Fig. 3. 24 





Fig. 7.3. The Wigner function of several Fock states of a 
harmonic oscillator: (a) n = 0, (b) n = 1; (c) n = 5. Adapted 
from http://en.wikipedia.org/wiki/Wigner_function . 



Hence it cannot be used in the role of classical probability density p(X, P), otherwise we would 
get a negative probability for measurement in certain intervals dXdP - the notion hard to interpret. The 
same is true for most other quantum systems. Indeed, this fact could be predicted just by looking at 
definition (50) applied to a pure quantum state, in which the density function may be factored - see Eq. 
(31): 



23 Please note that in notations of that section, arguments {X, P) of the Wigner function should be replaced with 
{x,p}, and capital letters saved for the Cartesian coordinates of the central point (5.133), i.e. the classical complex 
amplitude of the oscillations. 

24 Spectacular experimental measurements of this function (for n = 0 and n = 1) were carried out recently by E. 
Bimbard et al, Phys. Rev. Lett. 112, 033601 (2014). 
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Changing argument P (say, at fixed X), we are essentially changing the spatial "frequency" 
(wavenumber) of the wavefunction product's Fourier component we are calculating, and we know that 
Fourier images typically change sign as the frequency is changed. Hence the wavefunctions should have 
some high-symmetry properties to avoid this effect. Indeed, the Gaussian functions (describing, for 
example, the Glauber states, and as the particular case, the ground state of the harmonic oscillator) have 
such a symmetry, but many other functions do not. 

To summarize, all attempts to use the Wigner function (or any other function) in quantum 
mechanics, to play the role of classical probability density p, fail. However, the Wigner function is still 
used for a semi-quantitative interpretation of states of open quantum systems. 



Wigner 
function: 
(7.63) IDpure 
quantum 
state 



7.3. Open system dynamics: Dephasing 

So far we have discussed the density matrix as something given. Now let us discuss the 
evolution of the matrix in time, starting from the simplest case when the system is in state (15) with 
time-independent probabilities Wj. In the Schrodinger picture we can rewrite Eq. (15) as 



w(o=Zh-(o)^(w y (o| 



(7.64) 



Differentiating this equation by parts, and using Eqs. (4.157)-(4.158), with the account of the Hermitian 
nature of the Hamiltonian operator, we get 



m = i»X I *j V) w j ( w j (0 1 + 1 w j W) w i { w .i (0 11 = Z E^l w j V)) w j ( w j (0 1 - 1 (f))Wj (wj (t) \h 

= Hj\ Wj (t))Wj (Wj (t) | - X | Wj (t))Wj ( W J (t) 

j J 

Now using Eq. (64) again (twice), we get the so-called von Neumann equation 25 



(7.65) 



ihw 



II . w 



von Neumann 
(7.66) equation 



This equation is similar in structure to Eq. (4.199) describing the time evolution of the Heisenberg- 
picture operators: 



in A = 



AM 



(7.67) 



besides the operator order in the commutator, i.e., the sign of the right-hand part. This is quite natural, 
because Eq. (66) belongs to the Schrodinger picture, while Eq. (67) to the Heisenberg picture of the 
quantum dynamics. 



25 In many texts, it is called the "Liouville equation", due to the philosophical proximity to the classical Liouville 
theorem for the distribution function p(X, P) or its multi-dimensional analog - see, e.g., SM Sec. 6. 1 , in particular 
Eq. (6.5). 
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In the general case when a system, initially out of equilibrium, comes into a contact with the 
environment, probabilities Wj change, and dynamics is described by equations more complex than Eq. 
(66). However, we still can use this equation to discuss, using a simple model, the second (after the 
energy relaxation) major effect of the environment, dephasing (also called "decoherence"). 26 Let us 
consider the following model of a system interacting (weakly!) with environment: 27 



System's 
interaction 
with 

environment 



/V /V /\ r -\ /*, 

H = H s +H e {X) + H. m 



(7.68) 



Let us consider the simplest, two-level system, taking its Hamiltonian in the simplest form, 

H s =a z & z , (7.69) 

(as we know from Sec. 4.6, such Hamiltonian is sufficient to avoid the energy level degeneracy), and a 
factorable (bilinear) interaction - cf. Eq. (6.148) and its discussion: 

H^=-f{X\a z . (7.70) 

Here / is a Hermitian operator depending only on the set {X\ of environmental degrees of freedom 
("coordinates"). These coordinates belong to the Hilbert space different from that of the two-level 
system, and hence operators f{X\ and i/ e {/l}(that describes the environment) commute with & z - and 

any other intrinsic operator of the two-level system. Of course, any realistic H e \X\ is very complex, so 
that it may be surprising how much we will be able to achieve without specifying it. 

Before we proceed to solution, let me remind the reader of the important two-level systems that 
may be described by this model. The first example is an electron in an external magnetic field of a fixed 
direction (taken for axis z), which includes both an average component (3 Z \ and a random (fluctuating) 

component 3 z . As it follows from the discussion in Chapter 4, it may be described by Hamiltonian (68)- 
(70) with 

«,=/<b(3A "/ = aA- (7-71) 



The second important example is a particle in a double-quantum-well potential (Fig. 4), with a 
barrier between them sufficiently high to be impenetrable, and an additional force F(t) exerted by the 
environment. If the force is sufficiently weak, we can neglect its effects on the shape of quantum wells 
and hence on the localized wavefunctions \j/l,r, so that the force effect is reduced to the variation of the 
difference E L - E R = F(t)Ax between well eigenenergies. As a result, it may described by Eqs. (608)- 
(70) with 

a z «(F)Ax/2; -f*FAx/2. (7.72) 



26 Another example when Wj are constant in time, and hence Eq. (66) is valid, is the thermodynamic equilibrium. 
However, in this case the statistical operator is diagonal in the stationary state basis and hence commutes with the 
Hamiltonian. Hence the right-hand part of Eq. (66) vanishes, and it shows that the density matrix does not evolve 
in time at all - as it should. 

27 Though this model works very well in many cases (see the examples given below), it is not adequate for a 
particle interacting with the environment of similar particles. In this case the methods discussed in the next 
chapter are more relevant. 
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Returning to the general model (68)-(70), let us start its analysis from writing the usual equation 
of motion for the Heisenberg operator <r, : 28 



a : ,H]=(a z -f)[a z ,a z ]=0, 



(7.73) 



so that operator <r z does not evolve in time. What does this mean for the observables? For an arbitrary 
density matrix of the two-level system, 



w ■ 



V W 21 



H',- 



(7.74) 



22/ 



we can readily calculate the trace of operator <r, (since operator traces are basis - independent, we can 
do this in any basis, in particular in the usual z-basis): 

0 Y w„ wv 



Tr(a z w) = Tr(a z w) = Tr 



,0 



1 



V W 21 



w 



22 J 



W 



22 



(7.75) 



Hence, according to Eq. (5), <r z may be considered the operator for observable W\ - W2, so that 
in the case (73), the difference W\ - Wj does not depend on time, and since the sum of the probabilities 
is also fixed, W\ + W2 = 1, both of them are constant. (The physics of this result is especially clear for 
the model shown in Fig. 4: since the potential barrier separating the quantum wells is so high that 
tunneling through it is negligible, the interaction with environment cannot move the system from well 
into another one. It may look like nothing interesting may happen in such situation, but in a minute we 
will see this is not true.) Hence, we may use the von Neumann equation (66) for the density matrix 
evolution (in the Schrodinger picture). In the usual z-basis: 



zTzw = ih 



it-, 



w 



12 



w 



[H,w]=(a,-/)[o„w] 



22 j 



w, 



V W 21 



W 



22 J 



V W 21 



\\\ 



W 



22 J 



0 ) 



= («-"/) 



0 

■2w 



2w v 



(7.76) 



21 



28 This can be done because we may consider the whole system, including the environment, as a Hamiltonian one 
- see Eq. (68). 
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This means that though the diagonal elements, i.e., the probabilities of the states, do not evolve in time 
(as we already know), the off-diagonal coefficients do change; for example, 



ihw 



12 



2(a z - f)w n , 



(7.77) 



with a similar but complex-conjugate equation for W21. The solution of the linear differential equation 
(77) is straightforward, and yields 



1 r, ; ( / ) = 1 r, ; ( 0 ) expj - i 1\ exp<j i \ j f(t')dt 



h 



h 



(7.78) 



The first exponent is a deterministic c-number factor, while in the second one f(f) = f{/i(t)} is still an 
operator in the Hilbert space of the environment, and, from the point of view of the system of our 
interest, a random function of time. 

Let us start from the limit when the environment behaves classically. 29 In this case, the operator 
in Eq. (78) may be considered as a classical random function of time fit), provided that we average the 
result over the ensemble of many functions fit) describing many (macroscopically similar) experiments. 
For a small time interval t = dt — » 0, we can use the Taylor expansion of the exponent, truncating it after 
the quadratic term: 



exp \i-\f(t')dt' + U-\f(f)df + - i-\f{t')df i-\f{t")dt" 



rs dt r\ ai ai r\ 

1 + i : - 1 (f(t'))dt' --j\dt'\ dt"{f(t')f(t")) = 1 - -r \ dt' \ dt"K f (f - 1"). 



dt dt 



dt dt 



(7.79) 



tr 



0 0 



Correlation 
function of 
classical 
variable 



Here we have used the fact that the first average is equal to zero (it is evident from Eqs. (69)-(70) that if 
/had any average component, it could be included into parameter a), while the second average, called 
the correlation function, in a statistically- (i.e. macroscopically-) stationary state of environment may 
only depend on the time difference r =t'-t": 



(f(t')f(t")) = K f (t'-t") = K f (T). 



(7.80) 



If this difference is much larger than some time scale t c , called the correlation time of the random force, 
the values fit') axi&fit") are completely independent {uncorr elated), as illustrated in Fig. 5a, so that the 
correlation function has to tend to zero. On the other hand, at r = 0, i.e. t' = t", the correlation function 
is just the variance of f. 

(7.81) 



K f (0) = (f 2 }, 

and has to be positive. As a result, the function looks (qualitatively) like the sketch in Fig. 5b. 



29 This assumption is not in any contradiction with the quantum treatment of the two-level system, because a 
typical environment has very dense energy spectrum, so that the distances between them may be readily bridged 
by thermal excitations of energies ~ k B T « 2a z , often making its essentially classical. 
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(/wen 



(b) 



0 

T,. 



t'-t" 



Fig. 7.5. (a) Typical random 
process and (b) its correlation 
function - schematically. 



Hence, if we are only interested in time differences r much longer than r c , we may approximate 
Kj{ t) with a delta-function. Let us take it in the following convenient form 



K f (T)*h 2 D 9 S(T), 



(7.82) 



Phase 

diffusion 

coefficient 



where D p is a positive constant called the phase diffusion coefficient. The origin of this term stems from 
the very similar effect of diffusion of atoms or small solid particles in real space - the so-called (the 
Brownian motion? 0 Indeed, if a small classical particle moves in a highly viscous medium, its velocity 
is approximately proportional to the external force. Hence, if the random hits of a ID particle by the 
molecules may be described by a force which obeys a law similar to Eq. (82), the velocity (along any 
Cartesian coordinate) is also delta-correlated: 



v(f)) = 0, (v(t')v(t")) = 2Dd{t' - 1"). 



(7.83) 



Now we can integrate the kinematic equation x = v, to calculate particle's deviation from the initial 
position, 

t 

x(t)-x{Q)) = \v{t')dt', (7.84) 

0 

and its the variance: 

it t \ t t t t 

(fx(t) - x(0)f W J v(t')dt'\ v(t")dt" ) = \dt'\ dt"(v{t')v{t")) = J dt'\ dt"2DS(t' -t") = 2Dt. (7.85) 

\0 0 / 0 0 0 0 

This is the famous law of diffusion, showing that the r.m.s. deviation of the particle from the initial point 

1/2 

grows with time as (2Dt) , where constant D is called the diffusion coefficient. 

Returning to the diffusion of the quantum-mechanical phase, using Eq. (82), the last double 
integral in Eq. (79) yields h D p dt, so that 

2a, 



w 



12 



(dt)) = w n (0) exp \ -i-j-dt K 1 - 2D v dt). 



(7.86) 



Applying this formula to sequential time intervals, 



30 The theory of this phenomenon, first observed experimentally by biologist R. Brown in the early 1800s, was 
pioneered by A. Einstein in 1905 (see in particular Eq. (206) below) and developed in detail by M. Smoluchowski 
in 1906-1907, and A. Fokker in 1913. 
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, (2dt)) = {w n (dt)} exp | - i dt i(l - 2D v dt) = w n (0) exp I -i^ldt l(l - ID^dtf ,(7.87) 



2a, 



etc., for a finite time t = Ndt, in the limit N — >■ oo and dt — > 0 (at fixed 0 we get, 31 



(0) = (0) exp j -i^ti lim| 



(7.88a) 



By the definition of the natural logarithm base e, 32 this limit is just exp {-2D ^/}, so that, finally: 



Two-level 
system's 
dephasing 



(0) = ^ 12 (0)exp] - i-^-f lexp{- 2/y} = w 12 (0)expj - i^-f lexp 



. 2a 



h 



n 



(7.88b) 



So, due to coupling to environment, the off-diagonal elements of the density matrix decay with 
the characteristic dephasing time T 2 = 1/2/)^, providing a natural evolution from the density matrix (22) 
of a pure state, to the diagonal matrix, 



w = 



W x 0 
0 W- 



(7.89) 



2 J 



with the same probabilities W\£, describing a fully dephased (incoherent) classical mixture. 

Our simple model offers a very clear look at the nature of decoherence: "force" f(t), exerted by 
the environment, "shakes" the energy difference between two eigenstates of the system and hence the 
instant velocities 2(a z - f)lh of their mutual phase shift cp(f) — cf. Eq. (24). Due to randomness of the 
force, cp(i) performs a random walk around the trigonometric circle, so that eventually, averaging of its 
trigonometric functions exp{±z'^} over the possible states of environment yields zero, killing the off- 
diagonal elements of the density matrix. Our analysis, however, has left open two important issues: 

(i) Is it approach valid for a quantum description of a typical environment? 



(ii) If yes, what is D^l 



1 A. Fluctuation-dissipation theorem 

Similar questions may be asked about a more general situation, when the Hamiltonian H s of the 

system of interest (s), in the composite Hamiltonian (68), is not specified at all, but the interaction 
between that system its environment still has the bilinear form similar to Eqs. (70) and (6.130): 

H mt =-F{A}x, (7.90) 



31 This result is valid only if approximation (82) may be applied at time interval dt which, in turn, should be much 
smaller than T 2 , i.e. if the dephasing time is much longer that the environment's correlation time z c . This 
requirement is usually well satisfied, because in most environments, r c very short. For example, in the original 
Brownian motion experiments with few-um ink particles in water, it is of the order of the average interval 
between sequential molecular impacts, of the order of 10" 21 s. 

32 See, e.g., MA Eq. (1.2a). 
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where x is some observable of the subsystem s (say, a generalized coordinate or a generalized 
momentum). It may look incredible that in this very general situation one may make a very simple and 
powerful statement about the statistical properties of the generalized external force F, under only two 
(interrelated) conditions - which are satisfied in a huge number of cases of interest: 

(i) the coupling of system s of interest to environment e is weak - in the sense of the perturbation 
theory (see Chapter 6), and 

(ii) the environment may be considered as staying in thermodynamic equilibrium, with certain 
temperature T, regardless of the process in the system of interest. 33 

This famous statement is called the fluctuation-dissipation theorem (FDT). 34 Due to the 
importance of this fundamental result, let me derive it. 35 

Since by writing Eq. (68) we treat the whole system (s + e) as a Hamiltonian one, 36 we may use 
the Heisenberg equation (4.199) to write 



Generally, very little may be done with this equation, because the time evolution of the environment's 
Hamiltonian depends, in turn, on that of the force. This is where the perturbation theory becomes 
indispensable. Let us decompose the external force's operator into the following sum: 



where (until further notice) sign (...) means the statistical averaging over the environment alone. 37 From 
the point of view of system s, the first term of the sum (still an operator!) describes the average response 



33 The most frequent example of violation of these conditions is environment's overheating by the energy flow 
from the subsystem. I leave it to the reader to estimate the overheating of a standard physical laboratory room by a 
typical dissipative quantum process - the emission of an optical photon by an atom. (Hint: extremely small.) 

34 The FDT was first derived by H. Callen and T. Welton in 1 95 1 , on the background of an earlier derivation of 
its classical limit by H. Nyquist in 1928, and the pioneering 1905 work by A. Einstein - see below. 

35 The FDT may be proved in several ways which are different from, and shorter than the one given in this section 
- see, e.g., either SM Sees. 5.5 and 5.6 (based on H. Nyquist's arguments), or the original paper by H. Callen and 
T. Welton, Phys. Rev. 83, 34 (1951) - wonderful in its clarity. The longer approach I describe here, besides giving 
an important byproduct relation (109), is a very useful exercise in the operator manipulation and the perturbation 
theory in its integral form, different from the differential form used in Chapter 6. If the reader is not interested in 
this exercise, he or she may skip the derivation and jump directly to the result expressed by Eq. (134), which uses 
the notions defined by Eqs. (114) and (123). 

36 We can always do that if the local environment is large enough, so that the processes in our subsystem would 
not depend on the type of boundary between it and the external environment; in particular we may assume the 
total system closed, i.e. Hamiltonian. 

37 For usual ("ergodic") environments, without intrinsic long-term memories, this statistical averaging over an 
ensemble of environments is equivalent to averaging over relatively short times - much longer than the correlation 
time t c of the environment, but still much longer than the characteristic time of evolution of the system under 
analysis, such as the dephasing time T 2 and the energy relaxation time T\ (still to be calculated). As was already 
mentioned, in most practical environments, r c is very short. Thus, for relatively "massive" (inertial) systems of 
interest the separation of the averaging into two steps is well justified. 




because, as was discussed in the last section, operator F{a} commutes with operators H s and x . 




(7.92) 
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of the environment to the system dynamics (possibly, including such irreversible effects as 
friction/viscosity), and has to be calculated with account of their interaction - as will do later in this 
section. On the other hand, the second term in Eq. (92) presents fluctuations of the environment, which 
exist even in the absence of system s. Hence, in the first nonvanishing approximation in the interaction 
strength, the fluctuation part may be calculated ignoring the interaction, i.e. treating the environment as 
being in the thermodynamic equilibrium: 38 



ihF = 



F > H e\eq 



(7.93) 



Since in this approximation the environment's Hamiltonian does not have an explicit dependence 
of time, the solution of this equation may be written combining Eqs. (4.175) and (4.190): 



F(t) = exp - H e | eq t F(0)exp - - H e | eq t l 



(7.94) 



Let us use this relation to calculate the correlation function of fluctuations, defined similarly to Eq. (80), 
but paying close attention to the order of the time arguments (very soon we will see why): 



F(t)F(t')} = ^exp|-i/^| J p(0)exp|--i/^|exp|-i/^'| J p(0)exp|--^^'|^ (7.95) 

where the thermal equilibrium of environment is implied. We are at will to calculate this expectation 
value in any basis, and the best choice is evident, because in the environment's stationary state basis, its 
Hamiltonian, the exponents in Eq. (95), and the density operator of the environment are all represented 
by diagonal matrices. Using Eq. (5), the correlation function becomes 



F(t)F(t'fj = Tr 

-Z 



wexp 



{h 
h 



exp 





>exp< 




>F(0)exp< 




[ 




>exp< 




>F(0)exp< 




> 



(7.96) 



= IX ^[-E n f m , exp|-i^jexp|l^ f n , n expj-^/j „„„ 

= Z^,|^rexp|» fe "^' )(f " 0 }, where E = E 9 -E H .. 

Here W n are the Gibbs distribution probabilities, given by Eq. (23) with environment's temperature T, 
andF„„' are the Schrodinger-picture matrix elements of the interaction force operator. 

We see that correlator (96) is a function of the difference r = t - t' only (as it should be for 
fluctuations in a macroscopically stationary system), but may depend on the order of the operands. This 
is why let us denote this particular correlation function by upper index "+", 



38 Here we assume that for the equilibrium, Eq. (92) has zero average, because if this is not so, this average part of 
force may be always included into the Hamiltonian of subsystem s. 
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(7.97) 



and its counterpart by upper index "-' 




(7.98) 



Correlation 
functions 
of an 
operator 



So, in contrast with classical processes, in quantum mechanics the correlation function of fluctuations 
F is not necessarily time-symmetric: 

k;(t)-K;(t) = K;{t)-K;(-t)= (P(t)F(t')-F(t')F{t)) = 2*2X|F„„,f sin^ * 0, (7.99) 

so that P(t) gives a good example of a Heisenberg-picture operator whose "values", taken in different 
moments of time, generally do not commute - the opportunity already mentioned in Sec. 4. 6. 39 

Now let us return to the force decomposition (92), and calculate the first (average) component of 
the force. In order to do that, let us write the formal solution of Eq. (91) as follows: 



F{t) = ^\[F(t'\H e (t'h' 



(7.100) 



In the right-hand part of this relation, we cannot treat the Hamiltonian of the environment as an 
unperturbed (equilibrium) one, because the result would have zero statistical average. Hence, we should 
make one more step in our perturbative treatment, and take into account (in the first nonvanishing 
approximation) the effect of our system of interest (s) on the environment. To do this, let us write the (so 
far, exact) Heisenberg equation of motion for the environment's Hamiltonian, 



ihH. 



H„,H 



H..F 



and its formal solution, similar to Eq. (100), but for an arbitrary time t' rather than t: 

H e (0 = ~ ]x(t")[H e {t"),Ht")}dt" . 



(7.101) 



(7.102) 



Plugging this equality into the right-hand part of Eq. (100), and averaging the result (again, over the 
environment only!), we get 



1 r r 

F(tj) = \dt'\dt"x(t")([F(t'l [H e {t"\F(t") 



(7.103) 



-CO —CO 



As we will see imminently, this expression gives a nonvanishing result even if the right-hand- 
part averaging is carried over the unperturbed (thermal-equilibrium) environment, so that unless we are 
interested in higher-order corrections, there is no need to refine the result any further. This fact enables 
us to calculate the average in the right-hand part of Eq. (103) absolutely similarly to that in Eq. (96), 
using Eq. (94): 



A good sanity check here is that at z= 0, the difference (99) between Kp(r) and Kp(-f) vanishes. 
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= Tr{w[F(r),[H e F(r)]]} 
= Tr {w [F(r)H e F(r ) - F(t')F(t")JI e - KX^W) + F ('") H e F ('')l } 
= 5X [FAt'KF.*(t")-F*(t'M"K -E 9 F^)FAt , )+F im .(r)E H .F n% (r)] 

n,n' 

= -YW n E\F nn , 

n,n' 

Now, if we try to integrate each term of this sum, as Eq. (103) seems to require, we will see that 
the lower-limit substitution (at t',t"—>- oo) is uncertain, because the exponents oscillate without decay. 
This technical difficulty may be overcome by the following reasoning. As illustrated by the example 
considered in the previous section, coupling to a disordered environment makes the "memory horizon" 
of the subsystem of our interest (s) finite: its current state does not depend on its history beyond certain 
time scale - in that example, the dephasing time T2. (Actually, this is true for virtually all real physical 
systems, in contrast to the idealized models such as a dissipation-free pendulum that swings for ever and 
ever with the same amplitude.) As a result, the functions under integrals of Eq. (103), i.e. the sum (104), 
should self-average at a certain finite time. One simple technique for expressing this fact mathematically 
is just dropping the lower-limit substitution; this would give the correct result for Eq. (103). However, a 
better (mathematically more acceptable) trick is to first multiply the function under each integral by, 
respectively, exp{a(t - t')} and exp{a(t - t')}, where a is a very small positive constant, then carry out 
the integration, and after that take the limit a — > 0. The physical justification of this procedure may be 
provided by saying that system's behavior should not be affected if its interaction with the environment 
was not kept constant but was turned on gradually - say, exponentially with an infinitesimal rate a. With 
this modification, Eq. (103) becomes 

(F(0) = - F 5X%»-f lim -o J df\dt"x{t") 

n,n' -co _oo 

This double integration is over the area shaded in Fig. 6, so that the order of integration may be changed 
to the opposite one as 

t p t t t T 

\dt'\dt"...= \dt"\dt'...= \dt"\dj'..., (7.106) 

-co —00 -co t" -co 0 

where t' = t-t\ and t =t- t". 



expji— (*'-*"H + c.c. 



(7.104) 



expj*'— (t'-t")+ ff(f"-0j + c.c. . (7.105) 




Fig. 7.6. 2D integration area 
in Eqs. (105) and (106). 
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As a result, Eq. (105) may be rewritten as a single integral, 




(7.107) 



Ensemble 
average of 
environment's 
response 



whose kernel, 



G{T>Q) = -^Y j W n E\F m \ 2 \im s J\ 

n.n' a 



exp j i — (t — f) - sr > + c.c. 



dr' 



= lim Mn -> JFj-FL, sin e £T =-} W\F.\ sin 

' » » ' / 1 'Inn' **> 



(7.108) 



does not depend on the particular law of evolution of the subsystem (s) under study, i.e. provides a 
general characterization of its coupling to the environment. 

In Eq. (107) we may readily recognize the most general form of the linear response of a system 
(in our case, the environment), taking into account the causality principle, where G(r) is the response 
function (also called the "temporal Green's function") of the environment. 40 Comparing Eq. (108) with 
Eq. (99), we get a wonderfully simple universal relation, 41 





(7.109) 



Fluctuation 
commutator 
via Green's 
function 



that emphasizes once again the quantum nature of the correlation function's time asymmetry. However, 
the relation between G( r) and the force arcrf-commutator, 



(t + T), F(t) =(F(t + v)F(t) + F(t)F(t + t)) = K;(t)+K;(t) 



(7.110) 



is much more important because of the following reason. Relations (97)-(98) show that the so-called 

symmetrized correlation function, 



4). t;(,| ; t;(,| 4(M - 



= 2XI F ™'l 2cos 



2 



,2 ET - 2s \A 

cos e 1 1 

1 h 



(7.111) 



Symmetrized 

correlation 

function 



that is evidently an even function of time difference r, looks very similar to the response function (108), 
"only" with another trigonometric function under the sum. This similarity my be used to obtain an exact 
algebraic relation between the Fourier images of these two functions of r. Indeed, function (111) may be 
represented as the Fourier transform 42 



40 For a more detailed discussion of this function and the causality principle, see, e.g., CM Sec. 4.1. 

41 This relation does not come up in the easier derivations of the FDT, discussed in the beginning of this section. 

42 Due to their practical importance, and certain mathematical issues with their justification for random functions, 
Eqs. (1 12)-(1 13) have their own grand name, the Wiener-Khinchin theorem, though the math rigor aside, they are 
just a straightforward corollary of the Fourier integral transform (115) - see, e.g., SM Sec. 5.4. 
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K F (r) = J S F {co)e 1(01 dco = 2 1 S F (co) cos cot dco , 

-CO 0 

with the reciprocal transform 

j +CO j +CO 

S F (co) = — ) K F (r)e lWT dr = — j K F (r) cos cor dr. 

7.71 J " J 



(7.112) 



via the symmetrized spectral density of variable F, defined as 



Symmetrized 
spectral 
density 



1 



S F (co)S(co -eo>) = -{ F m F_ co , + F m ,F m ) = - ( \F„ , F_ m , , , 



1 



(7.113) 



(7.114) 



where F m (also an operator rather than a c-number!) is defined as 

1 -w> * 



2;r 



j" F{t)e icot dt, so that F(f) = J F f0 e" ^cfr . 



(7.115) 



The physical meaning of function Sf{co) becomes evident if we write Eq. (112) for the particular 
case t = 0: 



K F (0) = (f 2 \ = j" S F (e>)<fo = 2 J S F (g>)</g> . 



(7.116) 



This formula implies that if we pass function F(f) through a linear filter cutting from its frequency 
spectrum a narrow band dco of real (positive) frequencies, then variance (F/ ) of the filtered signal Fj(f) 
would be equal to 2Sj={co)dco - hence the name "spectral density". 43 

Let us use Eqs. (Ill) and (1 13) to calculate the spectral density for our model: 



S F {a>) = ZW n \F nn ,\ 2 ^-]im^ 0 J 



Ex -e\T\ 



2n 



cos e 

h 



e lWL dr 



2n 



z^i^r iim -oj 



\.Er\ 
expj i > + c.c. 



- £T e l0,T dr 



(7.117) 



= T-Z r »l F ™'| 2lim ^o 



1 1 

i[E lh + co)- s it-E~7h~+ co)-s 



Now it is a convenient time to recall that each of the two summations here is over the eigenenergy 
spectrum of the environment whose spectrum is virtually continuous because of its large size, so that we 
may transform each sum into an integral just as this was done in Sec. 6.6: 



^...^\...dn=\...p{E n )dE n . 



(7.118) 



43 An alternative popular measure of spectral density is $f{ v) = (Ff)/dv= 4kSf((o), where v= cdln'vs, the 
"cyclic" frequency (measured in Hz). 
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where p(E) is the density of environment's states at a given energy. This transformation yields 
hm . n \dE..WiE..)o\E..)\dE.,oiE.,)\F...,\ -r— [ 

2n 



S F (co) = -L lim_ 0 \ dE n W(E n )p{E n )J dE n ,p{E n ,)\F nn , \ 



i(e lh-co)-e i(- E I h - co)- £ 



(7.119) 



Since the square bracket depends only on a specific linear combination of two energies, E = E n -E n ,, it 
is convenient to introduce also another, linearly-independent combination of the energies, for example, 
the average energy E = (E n + E n , )/ 2 , so that the state energies may be presented as 



E=E + - 



E_, =E 



(7.120) 



With this notation, Eq. (119) becomes 



S F (a>)=- 



2n 



lim 



\dE 
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dE 

i(e -hco)-he 
dE 

i(- E -ficoS-fis 



(7.121) 



Due to the smallness of parameter he (which should be much less than all real energies, including k B T, 
hco, E n , and E n ), each of the internal integrals is dominated by an infinitesimal vicinity of one point, 
E ± =±ha>, in which the spectral density, matrix elements, and the Gibbs probabilities do not change 
considerably, and may be taken out of the integrals, so that they may be worked out explicitly: 44 



dE 



= -^ im ^o\dEp + p 
= ^\p + p_[w + \F + \ 2 +W_\F_\ 2 ]dE, 



+ W_\F_ 



+ CO • l J^i -f- | -f- +CC 

*w 2 J t^ r~ h£ 2 d E + w_\ F _\ 2 \ 

-co [e -hco) +(hs) 2 -OC 



dE 

i\—E — hco)- he 

i(e + hco)-hs 
(E + hcof +{hs) 2 



dE 



(7.122) 



where indices ± mark function values at the special points E ± = ±hco, i.e. E„ = E„-± hco. The physics of 
these points becomes simple if we interpret state n, that is the argument of the equilibrium Gibbs 
distribution function W n , as the initial state of the environment, and n ' as its finite state. Then the top- 
sign point corresponds to E n > = E n - hco, i.e. to the emission of one energy quantum hco of the 
"observation" frequency co by the environment into subsystem s of interest, while the bottom-sign point 
E n - = E n + hco, corresponds to the absorption of such quantum by the environment. As Eq. (122) shows, 
both processes give similar positive contributions into force fluctuations. 



44 Using, e.g., MA Eq. (6.5a). (The imaginary parts of the integrals vanish, because integration in infinite limits 
may be always re-centered to finite points ±%co.) A mathematically enlightened reader may have noticed that the 
integrals might be taken without the introduction of small s, using the Cauchy theorem - see MA Eq. (15.1). 
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The situation is different for the Fourier image of the response function G(r), 45 



X {a>)= \G{T)e i0)T dr, 



(7.123) 



that is frequently called either the generalized susceptibility or the response function - in our case, of the 
susceptibility environment. Its physical meaning is that the complex function %(ca) = X \a) + i X "{co) relates the 
Fourier amplitudes of the generalized coordinate and generalized force: 46 



(7.124) 



The physics of its imaginary part x'i®) is especially clear. Indeed, if both F m and x m represent a 
sinusoidal classical process, say 



x(t) = x 0 cos at = ~^~ e 



o —ical A o „+icot 



Xn 



i.e. x m = x_ co = 



(7.125) 



Then, in accordance with the correspondence principle, Eq. (124) should hold for the c-number complex 
amplitudes F m and x m enabling us to calculate the time dependence of force, 



F(t) = F„e- iM + F_ m e +iwt = ^/"^ + z(~ co)x_ ro e 



+icot _ 

~ 2 L ' 



Z {co)e- i(0t +z{-co)e 



+I0)t 



= y [Of' + ix"Y ia * + (/ - ix"Y im ] = *o k'Wcos cot - X "{p)sm cot] 



(7.126) 



We see that x"(co) scales the part of the force that is ^/2-shifted from the coordinate oscillations, i.e. is 
in phase with those of velocity, and hence characterizes the time-average power flow from the system 
into the environment, i.e. the energy dissipation rate: 47 



P = F(t)x(t) = x 0 [x'(co)coscot -j"((y)sin<2#J(- cax 0 sin of) = — cox"{co) . 



(7.127) 



Let us calculate this function from Eqs. (108) and (123), just as we have done for the spectral 
density of fluctuations: 



X"(co) = Im \G(r)e ian dT = -^Kf Mm^ Im \- 

1 



= JW n \F m \ 2 ^ e ^lm 



n,n 
f 



ICOT - ST 

e e dr 



1 



\-E-hco-ihs E -Tico-ihs 



45 Integration in Eq. may be extended to the whole time axis, - co < r< +qo, if we complement definition (107) of 
G(r) for r> 0 with its definition as G(r) = 0 for r< 0, in correspondence with the causality principle. 

46 In order to prove this relation, it is sufficient to plug expression x s = x^' 0 * , or any sum of such exponents, 

into Eqs. (107) and then use definition (123). This simple exercise is highly recommended to the reader. 

47 The expression P = Fx = Fv used for the instant power flow is evident if x is the usual Cartesian coordinate of 
a mechanical system. According to analytical mechanics (see, e.g., CM Chapters 2 and 10), it is valid for any 
generalized coordinate - generalized force pair which forms the interaction Hamiltonian (90). 
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= YW n \F nn \ 2 



£->0 



hs 



hs 



(e + hcof + (hs) 2 (e -hcof + (fie) 2 



(7.128) 



Making the transfer (118) from the double sum to the double integral, and then the integration variable 
transfer (120), we get 
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(7.129) 



Now using the same argument about the smallness of parameter s as above, we may take the spectral 
densities, matrix elements of force, and the Gibbs probabilities out of the integrals, and work out the 
integrals, getting a result very similar to Eq. (122): 



X'\co) = n\p + p_ 



w If I 2 -w, If, I 2 



dE. 



(7.130) 



In order to relate these results, it is sufficient to notice that according to Eq. (23), the Gibbs 
probabilities W± are related by coefficients dependent on only the temperature T and observation 
frequency co\ 



W ± =W 



E+- 



W 



E± 



fl(0 



J_ E±hco/2 
Z eXP | k~T 



r(F)e X p|+ 



{ 2 KT, 



(7.131) 



so that both the spectral density and the dissipative part of susceptibility may expressed via the same 
integral over environment energies: 



S F («) = h coshf-^lj p + p_w(E )[|F + 1 2 + |F_ 



K 2k B T 



X"{co) = In sinh 
and hence are universally related as 



hoo 
y 2kJ 



\\p + P_w{e)\f + \ 2 +\F 



dE . 



dE . 



„ , x h .., s , ft® 
S F ^) = — /'Mcoth— - 
2k 2k B T 



(7.132) 
(7.133) 



Fluctuation- 
(7.134) dissipation 
theorem 



This is the Callen-Welton's fluctuation-dissipation theorem (FDT). It reveals the fundamental, 
intimate relation between dissipation and fluctuations induced by environment ("no dissipation without 
fluctuations") - hence the name. 48 In the classical limit, hco« k^T, the FDT is reduced to 



48 A curious feature of the FDT is that Eq. (134) includes the exactly same function of temperature as the average 
energy (26) of a quantum oscillator of frequency a>, though, as the reader could witness, the notion of the 
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h .., , 2k B T _ k B T Im%(co) 



S F (co) = — X "(oo) 

in nco 



71 



CO 



(7.135) 



In most systems of interest the last fraction tends to a finite (positive) constant in a substantial range of 
relatively low frequencies. Indeed, expanding Eq. (123) in the Taylor series in small co, we get 



%(co) = %(o) + icor] + with ^(o) = |g(z")Jz", and rj = j*G(r)rJr 



(7.136) 



Since the temporal Green's function is real by definition, the Taylor expansion of % "{co) = lm%(co) starts 
with the linear term icorj, where rj is a certain real coefficient, and unless rj = 0, is dominated by this 
term at small co. (The physical sense of constant rj becomes clear if we consider an environment that 
provides viscous friction with the simple law 



F) = —tjx, tj > 0. 



(7.137) 



viscosity For the Fourier images of coordinate and force this gives the relation F m = icoxa, so that according to Eq. 
coefficient (124) 



X\co) = ico?j, i.e. - v - - — = 77 > 0 . 

CO CO 



(7.138) 



Hence, even in the general case, coefficient rj may be considered as an effective low-speed viscosity 
provided by the environment.) 

In this case Eq. (134) turns into the Nyquist formula: 49 



Nyquist 
formula 



S (ca>) = ^-Tj, i.e. (F?) = 4k B Tijdv 



(7.139) 



According to Eq. (112), if such a constant spectral density 50 persisted at all frequencies, it would 
correspond to a delta-correlated process F{t), with 



K F (r) = In S F (0)S(t) = 2k B TjjS(r) t 
similar to already discussed above - see Eq. (82). 



(7.140) 



oscillator was by no means used in its derivation. As will see in the next section, this fact leads to rather 
interesting consequences and even conceptual opportunities. 

49 Actually, the 1928 work by H. Nyquist was about electronic noise in resistors, just discovered experimentally 
by his Bell Labs colleague J. Johnson. For an Ohmic resistor, as a dissipative "environment" of the electric circuit 
it is connected with, Eq. (137) is just the Ohm's law, and may be recast as either (V) = -R{dQ/dt) = RI, or (Z) = - 
G(d®/dt) = GV. Thus for voltage V in an open circuit, tj corresponds to resistance R, while for current / in the 
short circuit, to conductance G = l/R. In this case, the fluctuations described by Eq. (139) are referred to as the 
Johnson-Nyquist noise. (Because of this important application, any model leading to Eqs. (1 36)-(l 37) is 
frequently referred to as Ohmic dissipation, even if the physical nature of variables x and F is quite different.) 
Another note: the Nyquist formula (139) should not be confused with the Nyquist-Shannon theorem describing 
the minimum sampling rate of an analog signal. 

50 A random process whose properties may be reasonably approximated by constant spectral density is frequently 
called the white noise, because then it is a random mixture of all possible sinusoidal components with equal 
weights, reminding natural white light's composition. 



Chapter 7 



Page 29 of 56 



Essential Graduate Physics 



QM: Quantum Mechanics 



Since in the classical limit the right-hand part of Eq. (109) is negligible, and the correlation 
function may be considered an even function of time, the symmetrized function under the integral in Eq. 
(113) may be rewritten just as (F(t)F(0)). In the limit of low observation frequencies (in the sense that a> 
is much smaller than not only the quantum frontier k^Tlh, but also the frequency scale of function 
X"( co)/cd), Eq. (138) may be used to recast Eq. (135) in the form 




Green- 
(7.141) Kubo 
formula 

In some fields (especially in physical kinetics and chemical physics), this particular limit of the Nyquist 
formula is better known as the Green-Kubo (or just "Kubo") formula. 51 

To conclude this section, let me return for a minute to the questions formulated in our earlier 
discussion of dephasing in the two-level model. In that problem, the dephasing time scale is T% = \l2Dg,. 
Hence the classical approach to the environment, used in Sec. 3, is adequate if tiDg, « k B T. Next, we 
may identify operators / and & z participating in Eq. (70) with, respectively, operators F and x of the 
general Eq. (90). Then the comparison of Eqs. (82), (88) and (140) yields 

1 Air T Dephasing 

— = 2D =-^—<n, (7.142) time via 

T % viscosity 

so that, for the model described by Eq. (137) with temperature-independent viscosity rj, the dephasing 
rate is proportional to temperature. 



7.5. The Heisenberg-Langevin approach 

The fluctuation-dissipation theorem opens a very simple and efficient way for analysis of the 
system of interest (s in Fig. 1). It is to write its Heisenberg equations (4.199) of motion for relevant 
operators, which would now include the environmental force operator, and explore these equations 
using the Fourier transform and the Wiener-Khinchin theorem (112)-(113). Such approach to classical 
equations of motion is commonly associated with the name of Langevin, 52 so that its extension to 
dynamics of Heisenberg-picture operators is frequently referred to as the Heisenberg-Langevin (or 
"quantum Langevin") approach to open system analysis. 53 

Perhaps the best way to describe this method is to demonstrate how it works for the very 
important case of a ID harmonic oscillator, so that the generalized coordinate x of Sec. 4 is just the 
oscillator's coordinate. For the sake of simplicity, let us assume that the environment provides the 
simple Ohmic dissipation described by Eq. (137) - which is a good approximation in many cases. As we 



51 Named after M. Green and R. Kubo whose analyses (published, respectively, in 1954 and 1957) followed and 
acknowledged the pioneering contributions by Nyquist and Callen and Welton, but were based on different 
approaches (closer to the one used in this section), and as a result revealed important Eq. (109). 

52 After P. Langevin, whose 1908 work was the first systematic development of Einstein's ideas (1905) of the 
Brownian motion theory in the random force language, as an alternative to M. Smoluchowski' s approach using 
the probability density language - see Sec. 6 below. 

53 Perhaps the largest credit for this extension belongs to M. Lax whose work, in the early 1 960s, was motivated 
mostly by quantum electronics applications - see, e.g., his monograph M. Lax, Fluctuation and Coherent 
Phenomena in Classical and Quantum Physics, Gordon and Breach, 1968, and references therein. 



Chapter 7 



Page 30 of 56 



Essential Graduate Physics 



QM: Quantum Mechanics 



already know from Chapter 5, the Heisenberg equations of motion for operators of coordinate and 
momentum of the oscillator, in the presence of external force, are 

x = ^-, p = -mco 2 x + F, (7.143) 
m 

so that using Eqs. (92) and (137), we get 

x = — , p = -mcolx-rix + F(t). (7.144) 
m 

Combining Eqs. (144), we may write their system as a singe differential equation 

mx + rjx + ma>QX = F(t), (7.145) 

that is absolutely similar to the classical equation of motion. 54 (In the view of Eqs. (5.42) and (5.48), 
whose corollary the Ehrenfest theorem (5.49) is, this should be by no means surprising.) For the Fourier 
images of the operators, defined similarly to Eq. (115), Eq. (145) gives the following relation, 

*.= / 2 \ • » ( 7 - 146 ) 

m[a> 0 -co J-irjco 

that should be also well known to the reader from the classical theory of forced oscillations. However, 
since the Fourier components are still Heisenberg-picture operators, and their "values" for different co 
do not commute, we have to tread carefully. The best way to proceed is to write a copy of Eq. (146) for 
frequency (-co'), and then combine these equations to form a symmetrical combination similar that used 
in Eq. (114). The result is 

1 , , , v 1 1 



2 



-{x m x_ m ,+x_ m ,x m ) = —- w-(F 0) F_ oy +F_ m F m ). (7.147) 

m[co 0 - co )-irjco\ 1 



Since the spectral density definition similar to Eq. (114) is valid for any observable, in particular for x, 
Eq. (147) allows us to relate the symmetrized spectral densities of coordinate and force: 

S x (co) = S *W r = S *W . (7. 148) 

\m{col -co 2 )-irjco m 2 [co 2 - co 2 ) + (tjco) 

Now using an analog of Eq. (116) for x, we can calculate coordinate's variance: 

x 2 ) = KM=\SAco)dco=2\ S *W" (7.149) 
-oo o m [co Q - co j + (r/co ) 

where now, in contrast to the notation used in Sec. 4, sign (...) means the averaging over the usual 
statistical ensemble of many systems of interest - in our current case, of many harmonic oscillators. 

If the coupling to environment is so weak that viscosity rj is small (in the sense that the 
oscillator's dimensionless Q-factor 4S is large, Q = mcoo/t] » 1), this integral is dominated by the 



54 See, e.g., CM Sec. 4.1. 
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resonance peak in a narrow vicinity, I a> - tool = I E, I <$C a»o, of its resonance frequency, and we can take 
the relatively smooth function Sf(co) out of the integral, thus reducing it to a table integral: 55 



x 2 ) * 2S F (co 0 ) | — ^ * 2S F (co 0 )j- ^ : 

q m 1 \a>l - a> 2 ) + (rjoo) 2 - x \2ma> a i;) +(r/a> 0 ) 



(7.150) 



= 2S F (co 0 )-^— J ff =2S F (<o 0 ) T \^- = ^S p (<o 0 ). 

With the account of the FDT (134) and Eq. (138), this gives 

x 2 ) = — r/co 0 coth 5- = coth -. (7.151) 

' rjma> 0 2n 2k B T 2mco 0 ^k B T 

But this is exactly Eq. (48) that was obtained from the Gibbs distribution, without any explicit account 
of the environment - though keeping it in mind by using the notion of the thermally-equilibrium 
ensemble. 56 (Notice that the viscosity coefficient 77, that characterizes the oscillator-to-environment 
interaction strength, has cancelled!) Does this mean that we have toiled in vain? 

By no means. First of all, the FDT result has an important conceptual value. For example, let us 
consider the low-temperature limit k B T« hcoo, when Eq. (151) is reduced to 



h 

(7.152) 



2ma> 0 



Let us ask a naive question: What exactly is the origin of this coordinate uncertainty? From the point of 
view of the usual quantum mechanics of closed (Hamiltonian) systems, there is no doubt: this 
nonvanishing variance of coordinate is the result of the final spatial extension of the ground-state wave 
function, reflecting the Heisenberg's uncertainty relation (that in turn results from the fact that the 
operators of coordinate and momentum do not commute) - see Eq. (2.271). However, from the point of 
view of the Heisenberg-Langevin equation (145), variance (152) is an unalienable part of the oscillator's 
response to the fluctuation force F(t) exerted by the environment at frequencies 00 ~ coq. Though it is 
impossible to refute the former, absolutely legitimate point of view, in many applications it is much 
easier to subscribe to the latter standpoint, and treat the coordinate uncertainty as the result of the so- 
called quantum noise of the environment. This notion has received numerous confirmations in 
experiments that did not include any oscillators with the eigenfrequencies coq close to the noise 
measurement frequency a>. 51 

The advantage of the Heisenberg-Langevin approach is that for any 77 > 0 it is possible to 
calculate the (experimentally measurable!) distribution S x (a>), i.e. decompose the fluctuations into 
spectral components. This procedure is not restricted to the limit of small 77 (large Q factors); for any 
damping we may just plug the FDT (134) into Eq. (149) and integrate. As an example, let us have a look 



55 See, e.g., MA Eq. (6.5a). 

56 By the way, the simplest way to calculate Sf\(o}, i.e. to derive the FDT, is to require that Eqs. (48) and (150) 
give the same result for an oscillator with any eigenfrequency a>. This is exactly the approach used by H. Nyquist 
(for the classical case) - see also SM Sec. 5.5. 

57 See, for example, R. Koch et al, Phys. Lev. B 26, 74 (1982).. 
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at the so-called quantum diffusion. A free ID particle may be considered as the particular case of a ID 
harmonic oscillator with coo = 0, so that combining Eqs. (134) and (149), we get 



2 f y*^*\ 2 = 2«1 - I , - ^coth^<to. (7.153) 
{{mco 2 ) 2 +{i 1 (d) 1 {(ma 2 ) 2 +{? 7 g)) 2 2n 2k B T 



This integral has two divergences. The first one, of the type \dcdo} at the lower limit, is just a 
classical effect: according to Eq. (85), particle's displacement variance grows with time, so it cannot 
have a finite time-independent value that Eq. (153) tries to calculate. However, we still can use that 
result to single out the quantum noise effect on diffusion - say, by comparing it with a similar but purely 
classical case. These effects are prominent at high frequencies, especially if the quantum noise 
overcomes the thermal noise before the dynamic cut-off, i.e. if 

(7.154) 

h m 

In this case there is a broad range of frequencies where the quantum noise gives a substantial 
contribution to the integral: 

Quantum , , ,. ,., ,, ., ,, 

diffusion (x ) ~2rj - — — dco = — = — In . (7.155) 




Formally, this contribution diverges at either m — > 0 or T — > 0, but this logarithmic (i.e. extremely weak) 
divergence is readily quenched by an almost any change of the environment model at very high 
frequencies, where the "Ohmic" approximation given by Eq. (136) becomes unrealistic. 

The Heisenberg-Langevin approach is extremely simple and powerful, 58 but is has its limitations. 
The main one is that if the equations of motion for the Heisenberg operators are not linear, there is no 
linear relation, such as Eq. (146), between the Fourier images of the generalized force and generalized 
coordinate, and as the result there is no simple relation, such as Eq. (148), between their spectral 
densities. In other words, if the Heisenberg equation of motion are nonlinear, there is no regular simple 
way to use them to calculate statistical properties of the observables. For example, let us return to the 
dephasing problem described by Eqs. (68)-(70), and assume that the generalized force is characterized 
by relations similar to (93) and (134). Now writing the Heisenberg equations of motion for the two 
remaining spin operators, and using the commutation relations between them, we get 



a- f{t) a -fl-/(0 A 



a x =-2^^L& y , a y =2^-L^a x . (7.156) 



These equations do not provide a linear relation between the Pauli operators and the fluctuation force, so 
even if we know spectral properties of the latter from the FDT, this does not help too much - unless we 
return to the approximate, classical approach described in Sec. 3 above. 59 



58 Its natural generalizations enable analyses of fluctuations in arbitrary linear systems, i.e. the systems described 
by linear differential (or integro-differential) equations of motion, including those with many degrees of freedom, 
and distributed systems (continua). 

59 For some calculations, this problem may be avoided by linearization: if we are only interested in small 
fluctuations, the Heisenberg equations of motion may be linearized about their expectation values (see, e.g., CM 
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7.6. Density matrix approach 

The main alternative approach, that is essentially a generalization of that used in Sec. 2, is to 
extract the final results from the dynamics of the density matrix of our subsystem s of interest (which, 
from this point on, will be called w. s ). I will discuss this approach in detail, 60 cutting just a few technical 
corners, in each case referring the reader to special literature. 

We already know that the density matrix allows the calculation of the expectation value of any 
observable of system s - see Eq. (5). However, our initial recipe (6) for the density matrix calculation, 
which requires the knowledge of the exact state (2) of the whole Universe, is not too practicable, while 
the von Neumann equation (66) for the density matrix evolution is limited to cases in which 
probabilities Wj of the system states are fixed - thus excluding such important effects as the energy 
relaxation. However, such effects may be analyzed using a different assumption - that the system of 
interest interacts only with some local environment (say, with the lab room) that is in the thermally- 
equilibrium state described by a diagonal density matrix - see Eqs. (15) and (23). 

This calculation is facilitated by the following observation. Let us number the basis states of the 
full local system (the system of our interest plus its local environment) by index /, and apply Eq. (5) to 
write 



A) = Tv(Aw) = X A w w n = X (/ \A\ /'>(/' H / 

/,/' 



(7.157) 



/,/■ 



where w is the statistical operator of this full composite system. At weak interaction between the 
system s and local environment e, their variables reside in different Hilbert spaces, so that we can write 



K)=h)®h 



(7.158) 



and if observable A depends only on the coordinates of system s, Eq. (157) yields 

( A ) = Z ( e * I ® ( s j 1% ) ® I e » ){ e * I ® ( s r H s j > ® I e k > 



jj 

k,k' 



= X (sj \A\ s r )8 kk , (e k , | ® ( Sf \w\ *,)®h) = Z A jf { s r \ Z ® («* H ^ ) ® 

w 

k,k' 

where w, is defined as 



7,7 



V k 



(7.159) 

s j) = Tr j (Aw s ), 

/ 



w. 



s Zfe H e t>= Tr **- 



(7.160) 



Since Eq. (159) is similar to Eq. (5), w s may serve as the statistical operator defined in the Hilbert space 

of the system of our interest. The huge advantage of Eqs. (159)-(160) is that they are valid for an 
arbitrary state of the local environment, including the case when it is in the thermodynamic equilibrium. 



Sec. 4.2), and the linear equations for variations solved either as has been shown above, or (if the expectation 
values evolve in time) by their Fourier expansions. 

60 As in Sec. 4, the reader not interested in the derivation of the basic equation (181) for the density matrix 
evolution may immediately jump to the discussion of this equation and its applications. 
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By the way, the similarity of Eqs. (5) and (159) may serve as the strong argument, promised in Sec. 1, 
for the validity of the former relation even if the Universe as a whole is not in a pure state. (The 
argument is, however, imperfect, because the latter relation has been derived from the former one.) 

Now, since at a sufficiently large size of the local environment e, the composite system (s + e) 
may be considered Hamiltonian, with fixed probabilities of its states, for the description of time 
evolution of its statistical operator w (again, in contrast to that, w s , of the system of our interest) we 

may use the von Neumann equation (66). Partitioning its right-hand part in accordance with Eq. (68), we 
get: 

m = [H s , w] + [H e , w] + [H mt ,w\ (7.161) 

The next step is to use the perturbation theory to solve this equation in the lowest order ini/ int that 

yields nonvanishing results due to the interaction. For that, Eq. (161) is not very convenient, because its 
right-hand part contains two other terms, which are much larger than the interaction Hamiltonian. To 
mitigate this technical difficulty, the interaction picture (which was discussed in the end of Sec. 4.6), is 
very handy - though not absolutely necessary. 

As a reminder, in that picture (whose entities will be marked with index /, with the unmarked 
operators assumed to be in the Schrodinger picture), both the operators and the state vectors (and hence 
the density matrix) depend on time. However, the time evolution of the operator of any observable A is 
described by Eq. (67) with the unperturbed part of the Hamiltonian only - see Eq. (4.214). In our 
current case (68), this means 

ihA I = Aj,H 0 \ (7.162) 

where the unperturbed Hamiltonian consists of two independent parts: 

H 0 =H S +H e . (7.163) 

On the other hand, the state vector evolution is governed by the interaction evolution operator u 1 that 
obeys Eqs. (4.215). Since this equation, using the interaction-picture Hamiltonian (4.216), 

H^ulH^u,, (7.164) 

is absolutely similar to the ordinary Schrodinger equation using the full Hamiltonian, we may repeat all 
arguments given in the beginning of Sec. 3 to conclude that the dynamics of the density matrix in the 
interaction picture of a Hamiltonian system is governed by the following analog of the von Neumann 
equation (66): 

ihft, =[i/ / ,w / ]. (7.165) 

Since this equation is similar in structure (with the opposite sign) to the Heisenberg equation (66), we 
may use solution Eq. (4.190) of the latter equation to write its analog: 61 



61 Notice the opposite order of the unitary operators, which results from the already mentioned sign difference. 
Note also that we could write a similar expression in the Schrodinger picture: w(t) = uw(Q)u^ , where u is the full 
time-evolution operator. 
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Wj{t) = Uj(t,6)w(0)uj (t.6) 



(7.166) 



It is also straightforward to verify that in this picture, the expectation value of any observable A may be 
found from the expression similar to the basic Eq. (5): 

(A} = Tr{A I w I ), (7.167) 

so that the interaction and Schrodinger pictures give the same final results. 

In the most frequent case of bilinear interaction (90), 62 Eq. (162) is readily simplified, in 
different ways, for the both operators participating in the product. In particular, for A = x , it yields 



ihx, 



x H 



x n H s 



+ 



x n H e 



(7.168) 



Since operator of coordinate is defined in the Hilbert space of system s, it commutes with the 
Hamiltonian of the environment, so that we finally get 



ihx j = Xj,H s \ 



(7.169) 



On the other hand, taking A = F , we should take into account that the last operator is defined in the 
Hilbert space of the environment, and commutes with the Hamiltonian of the unperturbed system s. As a 
result, we get 



ifiFj = 



(7.170) 



This means that with our time-independent unperturbed Hamiltonians H s and H e , the time evolution of 

the interaction-picture operators is rather simple. In particular, the analogy between Eq. (170) and Eq. 
(93) allows us to immediately write the following analog of Eq. (94): 



F, it) = exp|-#.f jF(0)exp|- -Hjj , 
so that in the stationary (eigenstate) basis of the environment, 

fcL(0 = exp|^^|F m ,(0)exp|-^^j = F M ,(0)exp| 



E -E , 

h 



(7.171) 



(7.172) 



and similarly (but in the basis of the eigenstates of system s) for operator x. Asa result, Eq. (164) may 
be also factored: 

expj^J^xexpj-^^Ye^^^ = -x^F, (t). 



62 A similar analysis of a more general case, when the interaction with environment may be represented as a sum 
of products of the type (90), may be found in a monograph by K. Blum, Density Matrix Theory and Applications, 
3 rd ed., Springer, 2012. 
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Now, as in Sec. 4, we may rewrite Eq. (165) in the integral form: 

l_'r 
ifi 



(7.174) 



plugging this result, for time t ', into the right-hand part of Eq. (174) again, we get 

W= -^2 jfo Wfr M*/M]k = -j? \[mht)\mhaMn]]dt' , (7.175) 



where, for the notation brevity, from this point on I will strip operators x and F of their index I. (Their 
time dependence indicates the interaction picture clearly enough.) 

So far, this equation is exact (and cannot be solved analytically), but this is the right time to 
notice that even if we take the density matrix in its right-hand part equal to its unperturbed value 
(corresponding to no interaction between system s and its thermally-equilibrium environment e), 

*/('')->*.(''>*.> with (e„h\e n ) = W n S nn ,, (7.176) 

where e n are the stationary states of the environment and W n are the Gibbs probabilities (23), Eq. (175) 
would still provide some nonvanishing time evolution of the density operator. This is exactly the first 
nonvanishing perturbation we have been looking for. Now using Eq. (160), we find the equation of 
evolution of the density operator of our system of interest: 

** W = -^j Tr « h)F(t), [mht'), w s (t')w e }}dt', (7.1 77) 

—00 

where the trace is over the stationary states of the environment. In order to spell out the right-hand part 
of Eq. (177), note again that the coordinate and force operators commute with each other (but not with 
themselves at different time moments!) and hence may be swapped, so that we may write 

Tr H [..., [...,...]] = x(t)x(t')w s (t')Tr n [p(t)F{t>)w e ]-x(t)w s (t')x(t')Tv n [f^M')} 
~ *{t')w s {t')x{t)lx n [F(t')w e F(t)}+w s (t')x(t')x(t)7r n [w,P(t')F{t)] 

n,n' n,n' 

- x{t')w s x{t)'Z F nn(t')W n ,F n , n (t) + W s x(t')x(t)Z KKAt^ (f ) 

n,n' n,n' 

Since the summation on both indices n and n ' in this expression is over the same energy level set (of all 
eigenstates of the environment), we may swap the indices in any of the sums. Doing that in the terms 
with factors W n ; we turn them into W n , so that this factor becomes common: 



Tr„ [..., [...,...]= 2 W n [x(t)x(t')w s (t')F m , (t)F n , n (f) - x(t)w s (t')x(t')F n , n {t)F nn , (f ') 



- x(t')w s x(t)F n , n (t')F nn , (t) + w s x{t')x{t)F nn (t')F n , n (f )]. 



(7.179) 



Now using Eq. (172), we get 
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.E(t-t') 



■ x(f')w ; x(f)expj; ^ | + w s x(^')x(^)exp| - z ■ 



h 



(7.180) 



= z ^ ^ r cos ^) m w s ( t >)] ] + »x ^ i 2 sm ^) [x(o, m *. m 

n,n' " n,n' " 

where {...,...} means the anticommutator - see Eq. (4.34). Comparing the two double sums 
participating in this expression with Eqs. (108) and (111), we see that they are nothing else than, 
respectively, the symmetrized correlation function and the Green's function (multiplied by till) of the 
time-difference argument x = t - t ' > 0. As the result, Eq. (177) takes a very simple form: 



w 



(0 = ~ \K F (t - tfx(t), [x(0, w s (f)]]df~ \ G(t - t'\x(t), {x(0, w s (0} ]dt'. 



Density 

(7.181) t ™f' 
evolution 



Let me hope that the reader enjoys this beautiful result as much as I do, and that it is a sufficient 
intellectual award for his or her effort of following its derivation. It gives a self-sufficient equation for 
time evolution of the density matrix of the system of our interest (s), with the effects of its environment 
represented only by two real algebraic functions of x - one (Kp) describing environment's fluctuations 
and another one (G) representing its the average response to system's dynamics. And most 
spectacularly, these are exactly the same functions as participate in the Heisenberg-Langevin approach 
to the problem, and hence related to each other by the fluctuation-dissipation theorem (134). 

After a short celebration, let us acknowledge that Eq. (181) is still an integro-differential 
equation that needs to be solved together with Eq. (169). Such equations do not allow explicit analytical 
solutions except for very simple (and not very interesting) cases. For most applications, further 
simplifications should be made. One of them is based on the fact (which was already discussed in Sec. 
3) that both environmental functions participating in Eq. (181) tend to zero when their argument x 
becomes larger that certain environment correlation time x c , which is frequently much shorter that the 
time scales T nn ' of the evolution of the density matrix elements. Moreover, the characteristic time scale 
of the coordinate operator evolution may be also short on the scale of T n „ : In this limit, all arguments t ' 
of the density operator giving substantial contributions to the right-hand part of Eq. (172) are so close to 
t that it does not matter whether its argument is t ' or just t. This simplification (t ' — > t) is known as the 
Markov approximation. 63 However, this approximation alone is still insufficient for finding the general 
solution of Eq. (181). Substantial further progress is possible in two important cases. 

The most important of them is when the intrinsic Hamiltonian H s of our system of interest is 
time-independent and has a very discrete eigenenergy spectrum E n , 64 with well-separated levels: 



63 Named after A. Markov (1856-1922; in older literature, "Markoff '), because the result of this approximation is 
a particular case of the Markov process whose future development is completely determined by its present state. 

64 Rather reluctantly, I will use this standard notation, E„, for the eigenenergies of our system of interest (s), in 
hope that the reader would not confuse these discrete energy levels with the quasi-continuous energy levels of its 
environment, participating in particular in Eqs. (108) and (111). As a reminder, by this stage of our calculations 
the environment levels have disappeared, leaving behind their "trace functions" Kf{t) and G(r). 
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\E n -E n ,\»^-. (7.182) 

nn' 

Let us see what does this condition yield for Eq. (181) rewritten for the matrix elements in the stationary 
state basis (from this point on, I will drop index s for brevity): 
-it t 
<• = ~TT \K F (t-tfx(t),[x(t'),w]] nn , dt'-^\G(t-tix(t),{x(t'),w}] nn , df; (7.183) 

-co -co 

after spelling out the commutators, it includes 4 operator products, which differ "only" by the operator 
order. Let us have a good look at the first product, 

(x(t)x(t')w) m , = Y. x n m (t)x mm ,(t')w mV , (7.184) 

m,m' 

where indices m and m ' run over the same set of eigenenergies of the system s of our interest as indices 
n and n '. According to Eq. (169) with a time-independent H s , matrix elements x nn > (in the stationary state 
basis) oscillate in time as exp {ico nn i), so that 

(x(t)x(t')w) nn , = ^x nm x mm , exp{i(a> nm t + co m J')}w mV , (7.185) 

m,m' 

where the coordinate matrix elements are in the Schrodinger picture now, and I have used the natural 
notation (6.85) for the quantum transition frequencies: 

ha> nn ,^E n -E n ,. (7.186) 

According to condition (182), frequencies ftwwith n ^ n' are much higher than the speed of evolution 
of the density matrix elements (in the interaction picture!) - in both the left-hand and right-hand parts of 
Eq. (183). As we already know from Sec. 6.5, this means that in the right-hand part of Eq. (183) we may 
keep only the terms that do not oscillate with frequencies co nn ; because they would give negligible 
contribution to the density matrix dynamics. 65 For that, in the double sum (185) we may keep only the 
terms proportional to difference (t - t'), because they will give (after integration over t') a slowly 
changing contribution to the right-hand part. 66 These terms should have co nm + co mm ' = 0, i.e. (E n - E m ) + 
(E m - E m ) = E n - E m ' = 0. For a non-degenerate energy spectrum, this requirement means m ' = n; as a 
result, the double sum is reduced to a single one: 

(x(t)x(t>)w) nn , * J V „ expftojf -t')}w nn , = J>J 2 Gxp{ia>„(t -t')}w nn , . (7.187) 

m m 

Another product, (wx(t')x(t)) m , , that appears in the right-hand part of Eq. (183), may be simplified 
absolutely similarly, giving 

{wx(t')x(t)h - EM' ^v{i«> n - m {t'-t)}w nn , . (7.188) 



65 This is essentially the same Rotating Wave Approximation (RWA) that is so instrumental in other fields of not 
only quantum mechanics, but classical physics as well - see, e.g., CM Sees. 4.2-4.5. 

66 As was already discussed in Sec. 4, the lower- limit substitution (t'= - oo) in integrals (174) gives zero, due to 
the finite -time "memory" of the system, expressed by the decay of the correlation and response functions at large 
values of the time delay t=t-t'. 
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These expressions hold true whether n and n ' are equal or not. The situation is different for two 
other products in the right-hand part of Eq. (183), with w sandwiched between x andx'. For example, 

(x(t)wx(t')) , = Yx (t)w ,x ,,(?)= w ,x,,exp\i(co t + a),,t')\. (7.189) 

\W V / /nn / i nm V / mm mn V / / , nm mm mn i v V run mn }) V / 



For this term, the same requirement of having a fast oscillating function of (t - t ') only yields a different 
condition: a> nm + <!),„■„■ = 0, i.e. 



{E n -E m )+{E m ,-E n ,) = 0. 



(7.190) 



Here the double sum reduction is possible only if we make an additional assumption that all interlevel 
energy distances are unique, i.e. our system of interest has no equidistant levels (such as in the harmonic 
oscillator). For diagonal elements (n = n '), the RWA requirement is reduced to m = m ', giving sums 
over all diagonal elements of the density matrix: 



(mwx(t')) nn = X K f exp{/^ {t - 1')} 



w 



(7.191) 



(In this diagonal case, another similar term (x(t')wx(t)) nn , is just a complex conjugate of Eq. (191).) For 

off-diagonal matrix elements (n^n '), the situation is different: Eq. (190) may be satisfied only if m = n 
and also m ' = n ', so that the double sum is reduced to just one, non-oscillating term: 

(mm'iL = «,Av, for n*n'. (7.192) 

The second similar term, (x(t')wx(t)) nn , is exactly the same, so that in one of the integrals of Eq. (183), 
these terms add up, while in the second one, they cancel. 

This is why the final equations of evolution look differently for diagonal and off-diagonal 
elements of the density matrix. For the former case (n = n '), Eq. (183) is reduced to the so-called master 
equation 61 relating diagonal elements w„„ of the density matrix, i.e. the energy level occupancies W n \ 68 



K = T\ X J] 



^K F {z\W n - W m )(exp{io) nm r} + exp{- ia} nm t}) 
n 

^- G(r)(- W n - W m ) (exp{io) nm r) - exp{- ico nm r}) 
2n 



(7.193) 



dr, 



where r = t - t'. Changing the summation index notation from m to n ', we may rewrite the master 
equation in its canonical form 



where coefficients 



w = V fr , w , - T ,w ) 




cos&> 



,t--G(t) 



sm&> ,r 




(7-194) Master 

equations 
and 

interlevel 
transition 

(7.195) rateS 



67 The master equations, first introduced to quantum mechanics in 1928 by W. Pauli, are sometimes called the 
"Pauli master equations", or "kinetic equations", or "rate equations". 

68 As Eq. (193) shows, the term with m = n would vanish, and thus may be legitimately excluded from the sum. 
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are called the interlevel transition rates. 69 Equation (194) has a very clear physical meaning of the level 
occupancy dynamics (i.e. the balance of probability flows YW) due to the quantum transitions between 
the energy levels (Fig. 6), in our current case caused by the interaction between the system of our 
interest and its environment. 70 



higher levels 



r , 

n— >n 



1 TTT 

lower levels 



W„ 



Fig. 7.6. Probability flows between the energy 
levels, described by the master equation (186). 



The Fourier transforms (113) and (123) enable us to express two integrals in Eq. (195) via, 
respectively, the symmetrized spectral density Sf(co) of environment force fluctuations and the 
imaginary part x'Xai) of the generalized susceptibility, both at frequency a>= (o nn \ After that we may use 
the fluctuation-dissipation theorem (134) to exclude the former function, getting finally 



Transition 
rates via 
generalized 
susceptibility 



Detailed 
balance 
equations 




(7.196) 



Note that since the imaginary part of the generalized susceptibility is an odd function of 
frequency, Eq. (196) is in compliance with the Gibbs distribution for arbitrary temperature. Indeed, 
according to this equation, the ratio of "up" and "down" rates for each pair of levels equals 



r , 



exp ;,/. A' )/ k B T} - 1 ' exp {(£„. - E n ) I A u T\ -1 



exp 



■E_. 



k B T 



(7.197) 



On the other hand, according to the Gibbs distribution (23), in thermal equilibrium the level populations 
should be in the same proportion, satisfying the so-called detailed balance equations, 



WT ^ , 

n n->n 



= W,T , 

n n - 



(7.198) 



for each pair {n, n'}, so that all right-hand parts of all Eqs. (194) could vanish - as they should. Thus, 
the stationary solution of the master equations indeed describes the thermal equilibrium. 

The closed system of master equations (194), sometimes complemented by additional right- 
hand-part terms that describe interlevel transitions due to other factors (e.g., by an external ac force with 



69 As Eq. (193) shows, the result for Y„^„- is described by Eq. (195) as well, provided that indices n and n' are 
swapped in all components of its right-hand part, including the swap a> nn - — > &>„ „ = -ftw. 

70 It is straightforward to show that at relatively low temperatures (k B T « \E n ' - E„\ ), Eq. (196) gives the same 
result as the Golden Rate formula (6.134) - see Exercise 2. (The low temperature limit is necessary to ensure that 
the initial occupancy of the excited level is negligible, as was assumed at the derivation of Eq. (6.134).) 
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a frequency close to one of co„ n ), is the key starting point for practical analyses of many quantum 
systems including quantum generators (masers and lasers). It is important to remember that it is strictly 
valid only in the rotating-wave approximation, i.e. if Eq. (182) is well satisfied for all n and n '. 

For a particular (but very important) case of a two-level system (say, with E\ > E2) in the low- 
temperature limit k^T « hcon = E\- Ez, rate » ^2^>\ defines the characteristic time T\ = l/ri_>2 
of the energy relaxation process that brings the diagonal elements of the density matrix to their 
thermally-equilibrium values (23). For the Ohmic dissipation described by Eq. (138), Eq. (196) yields a 
simple expression 




(1 Energy 



00 j / 

\ ^t k f{ t ) zK m r ex p{ / ^™ 7 "} + zK'™r ex p{- /<y «v^}+k„ -*„v) 2 

dr \w ,, for n ^ n } 



+ ^ G ( T \H\ X n m \ eX P{^,™ r }-ZK>»l Q ^{- iC °n;nA 



(7.200) 



In contrast with Eq. (194), the right-hand part of this equation includes both a real and an imaginary 
part, and hence it may be presented as 

-{VT m ,+iA nn ,)w im „ 



(7.201) 



1 °T 1 ( 

— = J -^t k A t ) Zl x ™J 2cos ^»» r+ ZK'»r cos ^ r+ ( x «« _x »'«') 

dr, for n^n' 



sin<z> t+ V \x. 



sma) n , m T 



(7.202) 



relaxation 
time 



Of course, time T\ should not be confused with the characteristic time Tj of relaxation of the off- 
diagonal elements, i.e. dephasing, which was already discussed in Sec. 3. By the way, let us see what do 
Eqs. (183) say about the dephasing rate. Taking into account our intermediate results (187)-(192), and 
merging the non-oscillating components (with m = n and m = n 1 ) of sums Eq. (187) and (188) with the 
terms (192), that also do not oscillate in time, we get the following equation: 



where both factors \IT nn > and A„„' are real. 71 As should be clear from Eq. (201), the second term in the 
right-hand part of this equation causes slow oscillations of the matrix elements w nn ; that, after returning 
to the Schrodinger picture, add just small corrections 72 to the unperturbed frequencies (183) of their 
oscillations, and are hence are not important for most applications. More important is the first term, 



71 Sometimes Eq. (200) (in any of its numerous alternative forms) is called the Redfield equation, after the 1965 
work by A. Redfield. Note, however, that several other authors, notably including (in the alphabetic order) H. 
Haken, W. Lamb, M. Lax, W. Louisell, and M. Scully, also made key contributions into the very fast 
development of the density-matrix approach to open quantum systems in the mid-1960s. 

72 This correction is frequently usually called the Lamb shift, because it was first observed experimentally in 1947 
by W. Lamb and R. Retherford, as a minor, ~1 GHz shift between energy levels of 2s and 2p states of hydrogen. 
(These levels are equal not only in the non-relativistic theory (Sec. 3.6), but also in the relativistic, Dirac theory 
(Sec. 9.7), if the electromagnetic environment is ignored.) The explanation of the shift, in the same 1947, by H. 
Bethe has launched the whole field of quantum electrodynamics - to be briefly discussed in Chapter 9. 



Chapter 7 



Page 42 of 56 



Essential Graduate Physics 



QM: Quantum Mechanics 



General 
result 
for 

dephasing 
rates 



because it describes the effect absent without the environment: an exponential decay of the off-diagonal 
matrix elements, i.e. dephasing. Comparing the first 2 terms of Eq. (202) with Eq. (195), we see that the 
dephasing rates may be described by a very simple formula: 



1 _ 1 

T ~ 2 
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nn 
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(7.203) 



where the low-frequency viscosity coefficient rj is again defined as lim^o X "((0)1(0- see Eq. (138). 

This result shows that two effects yield independent contributions into dephasing. The first of 
them may be interpreted as a result of the "virtual" transitions of the system to other energy levels m; 
according to Eq. (187), it is proportional to the strength of coupling to environment at relatively high 
frequencies co nm and co n ' m . (If the energy quanta hcooi these frequencies are much larger than the thermal 
fluctuation scale kaT, only the lower levels, with E m < max [is,,, E n ] are important.) On the contrary, the 
second contribution is due to low-frequency, essentially classical fluctuations of the environment, and 
hence to the low-frequency dissipative susceptibility. If the susceptibility (more exactly, the ratio rj = 
X"((o)l(o) is frequency-independent, both contributions are of the same order, but their exact relation 
depends on the relation between the matrix elements x„„ ■ of a particular system. 

Returning again to the two-level system discussed in Sec. 3, the high-frequency contributions 
vanish because of the absence of transitions between its energy levels, while the low-frequency 
contribution yields 

1 ' k " T ' *,J^ k -£<lhX-{°,)J=^<l, (7-204) 



k R T i 



thus exactly reproducing the result (142) of the Heisenberg-Langevin approach. 73 Note also that Eq. 
(204) for 7/2 is very close in structure to Eq. (199) for T\. For our simple interaction model (70), the off- 
diagonal elements of operator x = a z in the stationary-state z-basis vanish, so that T\ — > 00. For the two- 
well implementation of the model (see Fig. 4 and its discussion), this result corresponds to a very high 
energy barrier between the wells, that inhibits tunneling, and hence any change of well occupancies Wi 
and Wr. However, T\ may become finite, and comparable with T%, if tunneling between the wells is 
substantial. 74 

Now let us briefly discuss dissipative systems with continuous spectrum. Unfortunately, for them 
the only (relatively :-) simple results that may be obtained from Eq. (181) are essentially classical in 
nature. As an illustration, let us consider the simplest example of a ID particle that interacts with a 
thermally-equilibrium environment, but otherwise is free to move (unconfined). As we know from 
Chapters 2 and 5, in this case the most convenient basis is that of momentum eigenstates p. In the 



73 The first form of Eq. (203), as well as the analysis of Sec. 3, imply that low-frequency fluctuations of any other 
origin, not taken into account in own current calculations (say, unintentional noise from experimental equipment), 
may also cause dephasing; such "technical fluctuations" are indeed a serious challenge at the experimental 
implementation of coherent qubit systems - see Sec. 8.5 below. 

74 The tunneling may be described without altering Eq. (70), by adding a term proportional to either o x or a y (or 
both) to the unperturbed Hamiltonian (69) - see Exercise 6. 
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momentum representation, the density matrix is just the c-number function w(p, p '), defined by Eq. (54), 
that has already been discussed in brief in Sec. 2. On the other hand, the coordinate operator, that also 
participates in the right-hand part of Eq. (181), has the form given by the first of Eqs. (5.64), 



x = ifi 



8_ 
dp ' 



(7.205) 



dual to the coordinate representation formula (5.29). As we already know, such operators are local - see, 
e.g., Eq. (5.28b). Due to this locality, the whole right-hand part of Eq. (181) is local as well, and hence 
(within the framework of our perturbative treatment) the interaction with environment affects essentially 
only the diagonal values w(p, p) of the density matrix, i.e. the momentum probability density w(p). Let 
us find the equation governing the evolution of this function in time. 

Generally, in the interaction picture, matrix elements of operators x and w acquire some time 
dependence, but in the limit p' — > p, this dynamics lacks the high frequencies (186) that have been so 
helpful for the derivation of master equations. As a result, the only serious simplification of Eq. (181) is 
possible in the Markov approximation, when the time scale of the density matrix evolution is much 
longer than the correlation time r c of the environment, i.e. the time scale of functions Kf{r) and G(r). In 
this approximation, we may take the matrix elements out of the first integral of Eq. (181), 



1 ' 1 00 

- — \K F (t - t')dt'[x(t), [x(t'), Mt')] ] * -^r \K F (r)dr[x, [x, w] ] 



= " 7T S f {Ofx, [x,w]] = - tj [x, [x, w] ] , 
n n 



(7.206) 



and calculate the double commutator in the Schrodinger picture. This may be done either using an 
explicit expression for the matrix elements of the coordinate operator, dual to Eq. (5.28b), or in a 
simpler way, using the same trick as at the derivation of the Ehrenfest theorem in Sec. 5.2. Namely, 
expanding an arbitrary function j{p) into the Taylor series in one of its arguments (say, p), 



k=0 



k\ dp k 



(7.207) 



and applying Eq. (205) to each term, we can prove the following simple commutation relation: 



1 d'fu 



1 d 



k-\ 



k=0 



k\ dp 



k=0 



k\ dp k 



k=\ 



(k-l)dp 



k-\ 



dp 



= ifi 



dp 



(7.208) 



Now applying this result sequentially, first to w(p, p ') and then to the resulting commutator, we get 



[x,[x, w]] = 



x.ifi 



dw 
dp 



ifi 



d_ 

dp 



ifi 



dw 
dp 



= -h 



d 2 w 
dp 2 



(7.209) 



It may look like the second integral in Eq. (181) might be simplified similarly. However, it 
vanishes atp ' —> p, and t ' — > t, so that in order to calculate the first nonvanishing contribution from that 
integral for p = p', we have to take into account the small difference r = t - t' ~ r c between the 
arguments of the coordinate operators under that integral. This may be done using Eq. (169) with the 
free-particle Hamiltonian consisting of the kinetic-energy contribution alone: 
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i(t') - x(t) « —tx = -t- 



x,H s 



= -T- 



ih 
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X — 
2m 



= -T- 



m 



(7.210) 



where the exact argument of the operator in the right-hand part is already unimportant, and may be taken 
for t. As a result, we may use the last of Eqs. (136) to reduce the second term in the right-hand part of 
Eq. (181) to 



' | G(t - t'Jx(t), {x(t% w(t')}]dt' ~ — J G{r)rd7 



in 



xA — ,w\ 
m 



r? 

2ih 



xA — ,w\ 
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(7.211) 



In the momentum representation, the momentum operator and the density matrix w are just c-numbers 
and commute, so that, applying Eq. (208) to product pw, we get 



x,\ — ,w\ 
m 



x,2 — w 
m 



= 2ih 



d ( p A 



dp 



-w 

K m j 



(7.212) 



and may finally reduce the integro-differential equation Eq. (181) to a much simpler partial differential 
equation: 



Fokker -Plank 
equation 
for free 
1 D particle 



dw 
i ! dp 



d( p ^ 
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(7.213) 



This is the ID form of the famous Fokker-Planck equation describing the classical statistics of 
motion of a free ID particle in a medium with linear viscosity rj. The first, drift term in the right-hand 
part of Eq. (213) describes particle's deceleration due to the average viscous force (137), (F) = -rjv = - 
rjp/m, provided by the environment, while the second, diffusion term describes the effect of fluctuations: 
particle's random walk that obeys Eq. (85) with the diffusion coefficient 



Einstein 
relation 



D = j]k B T , 



(7.214) 



This fundamental Einstein relation 75 shows again the intimate connection between the dissipation 
(viscosity) and fluctuations, in this classical limit represented by their thermal energy scale ksT. 

For reader's reference, let me note that the Fokker-Planck equation (213) may be readily 
generalized to the 3D motion of a particle under the effect of an additional external force F ex t(r, t): 16 



75 It was the main result of A. Einstein's pioneering analysis of such Brownian motion in 1905. (The development 
of this analysis in 1906-1908 by M. Smoluchowski has led in 1912 to the Fokker-Planck theory - see, e.g., SM 
Sees. 5.6-5.7.) Note that this classical relation may be derived using several other ways - including those much 
simpler than used above. For example, since the Brownian particle's motion may be described by a linear 
Langevin equation, Eq. (214) may be readily obtained from the Nyquist formula (139) - see, e.g., SM Sec. 5.5. 

76 Moreover, Eq. (213) may be generalized to the motion in an additional periodic potential U{r). In this case, an 
analog of Eq. (2 1 5) for the probability density of quasi-momentum hq (rather than the genuine momentum p) 
includes an additional energy band index (say, n), an additional force F„= -VE n (where E„(q) is the energy band 
structure that was discussed in Sees. 2.7 and 3.4), and an additional term similar to the right-hand part of Eq. 
(194), describing interband transitions with quasi-momentum-dependent rates r„_ > „(q). These rates are still 
expressed by Eq. (196), but with the matrix elements x„„- replaced by those of the vector operator £2 = f -z'V of 

interband transitions, which was discussed in Chapter 5. For details and a particular example of a sinusoidal 
potential see, e.g., K. Likharev and A. Zorin, J. Low Temp. Phys. 59, 347 (1985). 
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3D 

(7.215) pianck 
equation 



where w = w(r, p, t) is the time-dependent probability density in the 6D phase space, and V p is the 
nabla/del operator of differentiation over the momentum components, defined similarly to its coordinate 
counterpart V. The Fokker-Planck equation in this form is the basis for many important applications; 
however, due to its classical character, its discussion is left for the SM part of my lecture notes. 77 

To summarize the discussion of the two alternative approaches to the analysis of quantum 
systems interacting with a thermally-equilibrium environment, described in the last three sections, let 
me emphasize that they give descriptions of the same phenomena, and are characterized by the same two 
functions G(r) and Kf{z), but from two different points of view. Namely, in the Heisenberg-Langevin 
approach we describe the system by operators that change (fluctuate) in time, even in thermal 
equilibrium, while in the density-matrix approach the system is described by non-fluctuating probability 
functions, such as W n (i) or w(p), that are stationary in equilibrium. In the (relatively rare) cases when a 
problem may be solved by either method, they give identical results for all observables. 



7.7. Quantum measurements 

Now we have got a sufficient quantum mechanics background for a brief discussion of quantum 
measurements J* Let me start with reminding the reader the only postulate of quantum mechanics that 
relates this theory with experiment. In Chapter 4 it was formulated for a pure state described with ket- 
vector 

\ a ) = H a j\ a j) ' (7.216) 

j 

where aj and Aj are, respectively, the eigenstates of the operator of observable A, defined by Eq. (4.68). 
According to the postulate, the outcome of each particular measurement of observable A may be 
uncertain, 79 but is restricted to the set of eigenstates Aj, with the probability of outcome Aj equal to 

Wj =\aj\ 2 . (7.217) 

Since we know now that the state of the system (or rather of the statistical ensemble of similar systems 
we are using for measurements) is generally not pure, this postulate should be re-worded as follows: 
even if the system is in the least uncertain state (216), 80 the measurement outcomes are still 
probabilistic, and obey Eq. (133). 



77 See SM Chapter 7. For a more detailed analysis and several examples of quantum effects in dissipative systems 
with continuous spectra see, e. g., U. Weiss, Quantum Dissipative Systems, 2 nd ed., World Scientific, 1999, or H.- 
P. Breuer and F. Petruccione, The Theory of Open Quantum Systems, Oxford U. Press, 2007. 

78 "Quantum measurements" is a very unfortunate term; it would be more sensible to speak about "measurements 
of quantum mechanical observables". However, the former term is so common and compact that I will use it. 

79 Besides the trivial case aj = Sjj- (so that Wj = Sjj-), when the system is in a certain eigenstate («/■) of operator A . 

80 This property of a pure state follows from Eq. (16): in the special basis Wj, only a pure state has Wj = Sjj: The 
reader still in doubt is invited to compare entropy S= -LjWjlnWj, the measure of system's disorder (see, e.g., SM 
Sec. 2.2) of the pure state (S= 0) with that in any state with several nonvanishing values of Wj (S> 0). 
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Quantum measurement may be understood as a procedure of transferring the "microscopic" 
information contained in coefficients «, into "macroscopically" available information about the 
outcomes of particular experiments, that may be recorded and reliably stored - say, on paper, or in a 
computer, or in our minds. If we believe that such transfer may be always done well enough, and do not 
worry too much how exactly, we are subscribing to the mathematical notion of measurement, that was 
(rather reluctantly) used in these notes - up to this point. However, every physicist should understand 
that measurements are performed by physical devices that also should obey the laws of quantum 
mechanics, and it is important to understand the basic laws of their operation. 

The founding fathers of quantum mechanics have not paid much attention to these issues, 
probably because of the following two reasons. First, at that time it looked like the experimental 
instruments (at least the best of them :-) were doing exactly what postulate (217) was telling. For 
example, had not the z-oriented Stern-Gerlach experiment turned two complex coefficients «| and ai, 
describing the incoming electron beam, into particle counter clicks with rates proportional to, 
respectively, |«t| 2 and \cti\ ? Also, the crude internal nature of these instruments made more detailed 
questions unnatural. For example, the electron rate counting with a Geiger counter involves an effective 
disappearance of each incoming electron inside a zillion-particle electric discharge avalanche. Thinking 
about such devices, it was hard to even imagine measurements that would not disturb the quantum state 
of the particle being measured. 

However, since that time the experimental techniques, notably including high vacuum, low 
temperatures, and low-noise electronics, have much improved, and eventually more inquisitive 
questions started to look not so hopeless. In my scheme of things, these questions may be grouped as 
follows: 

(i) What are the main laws of a quantum measurement as a physical process? In particular, 
should it always involve time irreversibility? a human/intelligent observer? (The last question is not as 
laughable as it may look - see below.) 

(ii) What is the state of the measured system just after a single-shot measurement - meaning the 
measurement process limited to a time interval much shorter that the time scale of measured system's 
evolution? This question is naturally related to the issues of repeated measurements and continuous 
monitoring of system's state. 

(iii) If a measurement of observable A produced a certain outcome Aj, can we believe that the 
system had been in the corresponding state aj just before the measurement? 

The last question is most closely related to various interpretations of quantum mechanics, and 
will be discussed in the concluding Chapter 10, and now let me provide some input on the first two 
groups of issues. 

First of all, I am happy to report that these is a virtual consensus of physicists on the two first 
questions of series (i). According to this consensus, any quantum measurement needs to result in a 
certain, distinguishable state of a macroscopic output component of the measurement instrument - see 
Fig. 7. (Traditionally, its component is called a pointer, though its role may be played by a printer or a 
plotter, an electronic circuit sending out the result as a number, etc.). 

This requirement implies that the measurement process should have the following features: 

- be time-irreversible , 
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- provide large "signal gain", i.e. mapping the quantum process with its /z-scale of action (i.e. of 
the energy-by-time product) onto a macroscopic motion of the pointer with a much larger action scale, 
and 

- if we want high measurement fidelity, the process should introduce as little additional 
uncertainty as permitted by the law of physics. 




quantum 
system 



necessary 
interaction 



back action 




to human 
observer 



macroscopic 
pointer 



Fig.7.7. General scheme of 
quantum measurement. 



All these requirements are fulfilled in a good Stern-Gerlach experiment. However, since the 
internal physics of the particle detector at this measurement is rather complex, let me give an example of 
a different, more simple single-shot scheme 81 capable of measuring the instant state of a typical two- 
level system, for example, a particle in a double quantum well potential (Fig. 8). 82 Let the system be, at t 
= 0, in a pure quantum state described by ket-vector 

|a) = a^|->) + a < _|<-), (7.218) 

where the component states — > and <— may be described by wavefunctions localized near the potential 
well bottoms at x s ~ +xo - see the blue lines in Fig. 8b. Let us rapidly change the potential profile of the 
system at t = 0, so that at t > 0, and near the origin, it may be well approximated by an inverted parabola 
(see the red line in Fig. 8b): 

t/(x,)« x), at t >0, be ,!«*,. (7.219) 

2 

It is straightforward to verify that the Heisenberg equations of motion in such inverted potential 
describe an exponential growth of operator x s in time (proportional to exp{At} and hence a similar 

growth of the expectation value (x s ) and its r.m.s. uncertainty Sk s P At this "inflation" stage, the 



81 This scheme may be implemented, for example, using a simple Josephson-junction circuit called the balanced 
comparator - see, e.g., T. Walls et ah, IEEE Trans, on Appl. Supercond. 17, 136 (2007), and references therein. 
Experiments by V. Semenov et al, IEEE Trans. Appl. Supercond. 7, 3617 (1997) have demonstrated that this 
system may have measurement accuracy dominated by quantum-mechanical uncertainty at relatively modest 
cooling (to ~ IK). One of advantages of such implementation of this measurement scheme is that it is based on 
externally- shunted Josephson junctions - devices whose quantum-mechanical model is in a quantitative 
agreement with experiment - see, e.g., D. Schwartz et al., Phys. Rev. Lett. 55, 1547 (1985). Colloquially, the 
balanced comparator is an instrument with a "well-documented Hamiltonian" including its part describing 
coupling to environment. 

82 As a reminder, dynamics of this system was discussed in Sec. 2.6 and then again in Sec. 6.1. 

83 Somewhat counter-intuitively, the latter growth plays a positive role for measurement fidelity. Indeed, it does 
not affect the intrinsic "signal-to-noise ratio" dxj{x s ), while making the intrinsic (say, quantum-mechanical) 
uncertainty much larger that possible noise contribution by the latter measurement stage(s). 
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coherence between the two component states — > and <— is still preserved, i.e. the time evolution is 
reversible. 

Now let the system be weakly coupled to a dissipative (e.g., Ohmic) environment. As we already 
know, the environment performs two functions. First, it provides motion with viscosity rj (141), so that 
the system would eventually come to rest at one of the relatively distant minima, ±x/, of the inverted 
potential (Fig. 8a). Second, the dissipative environment ensures state's dephasing on some time scale T%. 
If we select the measurement system parameters in such a way that 

x 0 « x 0 exp {AT 2 } « x f , (7.220) 

then the process, after the potential inversion, consists of the following stages, well separated in time: 

- the "inflation" stage, preserving the component state coherence but providing an exponential 
increase of its energy, 

- the dephasing stage, at which the coherence is suppressed, and the density matrix of the system 
is reduced to a diagonal form describing the classical mixture of the probability packets propagating to 
the left and to the right, and 

- the stage of settling to a new stationary state - a classical mixture of two states located near 

2 2 2 

points x s = ±Xf, with probabilities (217) equal to, respectively, W^. = [ol»| and = = 1 - \a^\ . 




Fig. 7.8. Potential inversion on (a) "macroscopic" and (b) "microscopic" scales of coordinate x. 

If the final states are macroscopically distinguishable (i.e. may play the role of a bistable 
pointer), as they are in the balanced-comparator implementation, there is absolutely no need, at any of 
these stages, to involve any mysterious "another mechanism of wavefunction change" (different from 
the regular, Schrodinger evolution) for the measurement process description. 

This may be the only appropriate time to mention, very briefly, the famous - or rather infamous 
Schrodinger cat paradox so much overplayed in popular press. (The only good aspect of this popularity 
is that the formulation of this paradox is certainly so well known to the reader, that I do not need to 
repeat it.) In this thought experiment, there is no need to discuss the (rather complex :-) physics of the 
cat. As soon as the charged particle, produced at the radioactive decay, reaches the Geiger counter, the 
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process rapidly becomes irreversible, so that the coherent state of the system is reduced to a classical 
mixture of two possible states: "decay" - "no decay", leading, correspondingly, to the "cat alive" - "cat 
dead" states. So, despite attempts by numerous authors, typically without proper physics background, to 
present this situation as a mystery whose discussion needs the involvement of professional philosophers, 
hopefully by this point the reader knows enough about dephasing to pay any attention. Let me, however, 
note the two non-trivial features of this gedanken experiment, that are met in most real experiments as 
well, including that with the potential inversion (Fig. 8). 

First, the role of the measured coordinate of the system under observation (s) may be played not 
by a coordinate of a single fundamental particle, but a certain combination of coordinates of many 
microscopic components of a macroscopic body. In particular, in Josephson junction systems such as the 
balanced comparator we essentially measure the persistent electric current ("supercurrent") - a certain 
linear combination of Cartesian components of the momenta of the electrons that constitute the Bose- 
Einstein condensate of Cooper pairs. At that, the role of the local environment (that contributes 
significantly to dissipative phenomena) is played by the same electrons, with other linear combinations 
of electron momenta playing the role of environmental degrees of freedom - which were called {A} in 
the last few sections. This makes the coupling to environment somewhat less apparent (at least for the 
people who do not know what a linear combination is :-). 

Second, one may argue that even after the balanced comparator (in our first example) or the cat 
(in the second example) has reached its final macroscopic state, human observer's realization that in this 
particular experiment the bistable pointer is in a certain state instantly decreases the probability (for the 
same observer!) of its being in the opposite state to zero. However, as was already discussed in Sec. 2.5, 
this is a very classical problem of the statistical ensemble redefinition that may be (or may be not) 
performed at observer's will. Such redefinition, if performed, is the only possible role of a human (or 
otherwise intelligent :-) observer in the measurement process; if we are only interested in an objective 
recording of results of a pre-fixed series of similar experiments, there is no need to include such 
observer into any discussion. 

The ensemble redefinition at measurement leads to several other paradoxes, of which the so- 
called quantum Zeno paradox is perhaps most spectacular. 84 Let us return to a two-level system with the 
unperturbed Hamiltonian given by Eq. (4.166), with InlCL much larger than the single-shot measurement 
time, and the system initially (at t = 0) is in a certain quantum well. Then, as we know from Sees. 2.6 
and 4.6, before the first measurement, the probability to find state in the initial state at time t is 

W(t) = cos 2 Q.t . (7.221) 
If the time is small enough (t = dt « I/O), we may use the Taylor expansion to write 



84 This name, coined by E. Sudarshan and B. Mishra in 1997 (though the paradox had been discussed in detail by 
A. Turing in 1954); is due to the apparent similarity of this paradox to classical paradoxes by ancient Greek 
philosopher Zeno of Elea. By the way, just to have a minute of fun, let us have a look what happens when Mother 
Nature is discussed by people to do not understand math and physics. The most famous of the classical Zeno 
paradoxes is the Achilles and Tortoise case: a fast runner Achilles can apparently never overtake a slower 
Tortoise, because (in the words by Aristotle) "the pursuer must first reach the point whence the pursued started, so 
that the slower must always hold a lead". For a physicist, the paradox has a trivial resolution, but let us listen what 
a philosopher (D. Burton) writes about it - not in some year BC, but in 2010 AD: "Given the history of 'final 
resolutions', from Aristotle onwards, it's probably foolhardy to think we've reached the end." For me, this is a sad 
symbol of modern philosophy. 
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W(dt) * 1 



Crdv 



(7.222) 



Now, let us return the two-level system, after its measurement, into the same quantum well, and 
let it evolve with the same Hamiltonian. Since the occupation of the opposite state is very small, the 
evolution of fFwill closely follow the same law as in Eq. (221), but with the initial value given by Eq. 
(222) Thus, when the system is measured again at time 2dt, 

.2 



W(2dt) « W{dt) 



1- 



Q dt 



2 \ 



1- 



Q z dt 



(7.223) 



1 10 

After repeating this cycle N times (with the total time t = Ndt still much less than TV /Q), the probability 
that the system is still in the initial state is 



W(Ndt) - W(t) 



1 



Q 2 dt 



2 \ 



1 



T 1 \ N 



AN' 



2.2 



1 



Q. l t 
4N 



(7.224) 



Comparing this result with Eq. (222), we see that the process of system transfer to the opposite quantum 
well has been slowed down rather dramatically, and in the limit N — > <x> (at fixed t), its evolution is 
completely stopped by the measurement process. There is of course nothing mysterious here; the 
evolution slowdown is due to statistical ensemble's redefinition. 

Now let me proceed to question group (ii), in particular to the general issue of the back action of 
the instrument upon the system under measurement (symbolized with the back arrow in Fig. 7). In 
instruments like the Geiger counter or the balanced comparator, such back action is very large, because 
the instrument essentially destroys ("demolishes") the initial state of the system under measurement. 
However, in the 1970s it was realized that this is not really necessary. In Sec. 3, we have already 
discussed an example of a two-level system coupled with environment (in our current context, with 
measurement instrument) and described by Hamiltonian 



H = H S +H mt +H e {4 



with H s =a<r z , H mt = -f{k}v z , 



so that 



]=0. 



(7.225) 



(7.226) 



Comparing this equality with Eq. (67) we see that in the Heisenberg picture, the Hamiltonian operator 
(and hence the energy) of the system of our interest does not change with time. On the other hand, the 
interaction can change the state of the instrument, so it may be used to measure its energy - or another 
observable whose operator commutes with the interaction Hamiltonian. Such trick is called either the 
quantum non-demolition (QND) or back-action-evading (BAE) measurements. 85 Let me present a fine 
example of a real measurement of this kind - see Fig. 9. 86 



85 For a detailed survey of this field see, e.g., either V. Braginsky and F. Khalili, Quantum Measurements, 
Cambridge U. Press, 1992, or H. Wiseman and G. Milburn, Quantum Measurement and Control, Cambridge u. 
Press, 2010. 

86 S. Peil and G. Gabrielse, Phys. Rev. Lett. 83, 1287 (1999). 
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Fig. 7.9. QND measurement of single electron's energy by Peil and Gabrielse: (a) the core of experimental 
setup, and (b) a record of the thermal excitation and spontaneous relaxation of Fock states. © AIP. 



In this experiment, a single electron is captured in a Penning trap - a combination of a (virtually) 
uniform magnetic field B and a quadrupole electric field. 87 Such electric field stabilizes cyclotron orbits 
but does not have any noticeable effect on electron motion in the plane perpendicular to the magnetic 
field, and hence on its Landau level energies (see Sec. 3.2): 



n + - 

v 2 y 



eB 

with G) c = — . (7.227) 
m„ 



(In the cited work, at B « 5.3 T, the cyclic frequency cojln was about 147 GHz, so that the level 

22 

splitting ho) c was close to 10" J, i.e. corresponded to temperature ~ 10 K, while the physical 
temperature of the system might be reduced well below that, down to ~80 mK). Now note that the 
analogy between a particle on a Landau levels and a harmonic oscillator goes beyond the energy 
spectrum. Indeed, since the Hamiltonian of a 2D particle in a perpendicular magnetic field may be 
reduced to that of a ID oscillator, we may repeat all procedures of Sec. 5.4 and rewrite it in the terms of 
creation-annihilation operators: 



H. = ha 




(7.228) 



In the Peil and Gabrielse experiment, the electron had one more degree of freedom - along the 
magnetic field. The electric field of the Penning trap creates a soft confining potential along this 
direction (vertical in Fig. 9a; let us take it for axis z), so that small electron oscillations along that axis 
could be well described as a ID harmonic oscillator of much lower eigenfrequency, in that particular 
experiment with a>J2n~ 64 MHz. This frequency could be measured very accurately (with error ~1 Hz) 
by sensitive electronics whose electric field affects z-motion of the electron, but not its motion in the 
perpendicular plane. In an exactly uniform magnetic field, the two modes of electron motion would be 



87 Similar to the one discussed in EM Sec. 2.4 (see in particular Eq. (2.77) and Fig. 2.7), but with additional 
rotation about one of the axes - either x or y. 
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completely uncoupled. However, the experimental setup included two special superconducting rings 
made of niobium (Fig. 9a), which slightly distorted the magnetic field and created an interaction 
between the modes, which might be well approximated by Hamiltonian 88 




H M = const x a ] a + -\z, (7.228) 



so that the main condition (226) of a QND measurement was well satisfied. At the same time, coupling 
(228) ensured that a change of the Fock state number n by 1 changed the z-oscillation eigenfrequency by 
-12.4 Hz. Since this shift was substantially larger than electronics noise, spontaneous changes of n (due 
to an uncontrolled coupling of the electron to environment) could be readily observed - moreover, 
continuously monitored - see Fig. 9b. (These data imply that there is virtually no effect of the measuring 

"~ 13 

instrument on the statistics on n - at least on the scale of minutes, i.e. as many as -10 cyclotron orbit 
periods.) Of course, any measurement - QND or not - cannot avoid the Heisenberg uncertainty 
relations; in this particular case, a permanent monitoring of the Fock state number n keeps its quantum 
phase fully uncertain. 

It is natural to wonder whether the QND measurement concept may be extended from quadratic 
forms like the energy to "usual" observables such as coordinates and momenta whose uncertainties are 
bound by the fundamental Heisenberg's relation. The answer is yes, but the required methods are a bit 
more tricky. For example, let us place an electrically charged particle into a uniform electric field £ = 
w x 3{t) of the instrument, so that their interaction Hamiltonian is 

H mt = -q3(t)x . (7.229) 

Such interaction certainly passes the information on the time evolution of coordinate x to the instrument. 
However, since Eq. (226) is not satisfied - at least for the kinetic-energy part of system's Hamiltonian; 
as a result the interaction simultaneously distorts the time evolution of particle's momentum. Indeed, 
writing the Heisenberg equation of motion (4.199) for the x-component of momentum, we get 



P-P 



^o=^(0- (7-230) 



Integrating Eq. (5.169) for the coordinate operator evolution, 89 we get expression, 

1 r 

x(t) = x(0) + — \p(t')dt', (7.231) 

m 0 

that shows that the perturbations (230) of the momentum would eventually find their way to the 
coordinate evolution. 

However, for such an important particular system as a harmonic oscillator, the following trick is 
possible. For this system, Eqs. (5.170) and (230) may be readily combined to give a second-order 



88 1 am simplifying the real situation a bit. Actually, in the experiment there was an electron spin's contribution to 
the interaction Hamiltonian as well, but since the large magnetic field polarized the spins quite reliably, their only 
role was a constant shift of frequency a> z . 

89 This simple equation is limited to ID systems with Hamiltonians of the type (2.50), but the reader should agree 
that this is a pretty general form. 
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differential equation for the coordinate operator, that is absolutely similar to the classical equation of 
motion, and has a similar solution: 90 

t 

x(t) = x(0) + f hit') sin co 0 (t - t')df . (1232) 
mco 0 J 0 

This formula confirms that generally the external field At) (in our case, the sensing field of the 
measurement instrument) affects the time evolution law. Note, however, that if the field is applied only 
at moments t' n separated by intervals 772, where T = In/co® is the oscillation period, its effect on 
coordinate vanishes at similarly spaced observation instants t„ = t n - + (m +1/2)7. This the idea of 
stroboscopic QND measurements. Of course, according to Eq. (230), even such measurement strongly 
perturbs the oscillator momentum, so that even if values x„ are measured with high accuracy, the 
Heisenberg's uncertainty relation is not violated. 

Experimental implementation of such measurements is not simple (and to the best of my 
knowledge they have never been successfully demonstrated), but this initial idea has opened a way to 
more practicable solutions. For example, it straightforward to use the Heisenberg equations of motion to 
show that if coupling of two harmonic oscillators, with coordinates x and X, and unperturbed 
eigenfrequencies co and Q, is modulated in time as 

H iat cc xJt cos cat cos Qt , (7.233) 

then the process in one of oscillators (say, that with frequency Q) does not affect dynamics of one of the 
quadrature components of another oscillator, defined by relations 91 

x\=xcoscot — —smcot, x 2 = xsmcot + — —coscot, (7.234) 
mco ~ mco 

while this component's motion does affect the dynamics of one of quadrature components of the 
counterpart oscillator. (For the counterpart couple of quadrature components, the information transfer 
goes in the opposite direction.) This scheme has been successfully used for QND measurements in the 
optical range, with coupling (233) provided by the optical Kerr effect. 92 

Please note that the last two QND measurement examples are based on the idea of modulation of 
a certain parameter in time - either in a short-pulse or sinusoidal form. So, the reader should not be 
surprised that if the only role of a QND measurement is a sensitive measurement of a weak classical 
force acting on a quantum probe system, 9 * i.e. a ID oscillator of eigenfrequency coq, it may be 



90 See, e.g., CM Sec. 4.1. Note in particular that function sin^r (with r=t-t') under the integral, divided by coq, 
is nothing more than the temporal Green's function G(r), of a loss-free harmonic oscillator. 

91 The physical sense of these relations should be clear from Fig. 5.6: they define a system of coordinates rotating 
clockwise with angular velocity co, so that the point representing unperturbed classical oscillations with that 
frequency is at rest in that rotating frame. (The "probability cloud" presenting a Glauber state is also stationary in 
coordinates [jci, xj\) The reader familiar with the classical theory oscillations may notice that X\ and xi are 
essentially the RWA variables u and v, i.e. the Poincare plane coordinates - see, e.g., CM Sec. 4.3-4.6, and 
especially Fig. 4.9. 

92 See, e.g., P. Grangier et al., Nature 396, 537 (1998), and references therein. This was, however, not the first 
QND implementation in optics - for a review see J. Roch et al., Appl. Phys. B 55, 291 (1992). 

93 As it is, for example, for gravitational wave detectors - see the discussion and references in Sec. 2.10. 
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implemented much simpler - just by modulating the oscillator parameter with frequency co « 2a>Q. From 
classical dynamics, we know that if the depth of such modulation exceeds a certain threshold value, it 
results in excitation of the so-called parametric oscillations with frequency a>/2, and one of two opposite 
phases. 94 In the language of Eq. (234), parametric excitation means an exponential growth of one of the 
quadrature components, with the sign depending on initial conditions, while the counterpart component 
is suppressed. Close to, but below the excitation threshold, the parameter modulation boosts all 
perturbations of the almost-excited component (including its quantum-mechanical uncertainty), and 
suppresses (squeezes) those of the counterpart component. The result is a squeezed state, already 
discussed in Sec. 5.5 above (see in particular Fig. 5.6), that allows one to notice the effect of external 
force on the oscillator on the backdrop of a quantum uncertainty smaller that the standard quantum limit 
- see the first of Eqs. (5.174). 

In electrical engineering, this fact may be conveniently formulated in terms of noise parameter 
®n of a linear amplifier - the instrument for continuous monitoring of an input "signal" - e.g., a 
microwave or optical waveform. 95 Namely, 0yv of "usual" (say, transistor or maser) amplifiers which are 
equally sensitive to both quadrature components of the signal, © N has a minimum value hdl, due to the 
quantum uncertainty pertinent to the quantum state of the amplifier itself (which therefore plays the role 
of its "quantum noise"). 96 On the other hand, a degenerate parametric amplifier, sensitive to just one 
quadrature component, may have 0^ well below hco/2, due to the squeezing of its ground state. 97 

Finally, let me note that the parameter-modulation schemes of the QND measurements are not 
limited to harmonic oscillators, and may be applied to other important quantum systems, notably 
including two-level (i.e. spin-'/i-like) systems. 98 

7.8. Exercise problems 

7.1 . Find the Wigner function of a harmonic oscillator in: 

(i) the thermodynamic equilibrium at temperature T, and 

(ii) the Glauber state with dimensionless complex amplitude a. 

Discuss the relation between the former result and the Gibbs distribution. 

7.2 . Show that the quantum-mechanical Golden Rule (6.1 1 1) and the master equation (196) give 
the same results for the rate of spontaneous quantum transitions n ' — > n in a system with discrete energy 
spectrum, weakly coupled to a low-temperature the heat bath (IcbT« ha> nn ). 



94 See, e.g., CM Sec. 4.5. 

95 For the exact definition of the latter parameter, suitable for the quantum sensitivity range (& N ~ ha>) as well, 
see, e.g., I. Devyatov et al., J. Appl. Phys. 60, 1808 (1986). In the classical noise limit (& N » tied), it coincides 
with k B T N , where T N is a more popular measure of electronics noise, called the noise temperature. 

96 This fact was recognized very early - see, e.g., H. Haus and J. Mullen, Phys. Rev. 128, 2407 (1962). 

97 See, e.g., the spectacular experiments by B. Yurke et al., Phys. Rev. Lett. 60, 764 (1988). 

98 See, e.g., D. Averin, Phys. Rev. Lett. 88, 207901 (2002). 
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Hint: Try to establish a relation between function Im%((i) nn ) that participates in Eq. (196), and 
the density of states p n that participates in the Golden Rule formulas, by considering a particular case of 
sinusoidal oscillations in the system of interest. 

7.3 . A harmonic oscillator is weakly coupled to an Ohmic environment. 

(i) Use the rotating-wave approximation to write equations of motion for the Heisenberg 
operators of the complex amplitude of oscillations. 

(ii) Calculate the expectation values of the correlators of the fluctuation force operators, 
participating in these equations, and express them via the average number (n) of thermally-induced 
excitations in equilibrium, given by the second of Eqs. (26b). 

7.4 . A ID harmonic oscillator with weak Ohmic damping (rj « ma>o) is initially in the second 
excited Fock state (n = 2). Assuming that temperature is low {ks,T « ticoo), find the time evolution of the 
expectation value (E) of its energy, using: 

(i) the density-matrix approach to quantum dynamics, in the form of master equation (194), and 

(ii) the Heisenberg-Langevin approach discussed in Sec. 7.5 of the lecture notes. 

Compare the results. 

7.5 . Derive Eq. (209) in an alternative way, using an expression dual to Eq. (5.28b). 

7.6 . A particle in a system of two coupled quantum wells (see, e.g., Fig. 4) is weakly coupled to 
an Ohmic environment. Find the time evolution of the probability W L {t) of finding the particle one of the 
wells, after it was placed there at t = 0 and let to evolve. Assume that the energy level splitting is much 
larger than k^T. 
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Chapter 8. Multiparticle Systems 

This chapter is a brief introduction to quantum mechanics of systems of similar particles, with a special 
attention to the case when they are indistinguishable. For such systems, the theory predicts (and 
experiment confirms) very specific effects even in the case of negligible explicit ("direct") interaction 
between the particles. The effects notably include the Bose-Einstein condensation of bosons, and the 
Pauli exclusion principle and exchange interaction for fermions. 



8.1. Distinguishable and indistinguishable particles 

The importance of quantum systems of many similar particles is probably self-evident; just the 
very fact of that most atoms include several/many electrons is sufficient to attract our attention. There 
are also important systems where the number of electrons is much higher than in one atom; for example, 
a cubic centimeter of a typical metal features -10 3 conduction electrons that cannot be attributed to 
particular atoms, and have to considered as common (and interacting!) pats of the system as the whole. 
Though quantum mechanics offers virtually no exact analytical solutions for systems of strongly 
interacting particles, 1 it reveals very important new effects even in the simplest case when particles do 
not interact, and least explicitly {directly). 

If non-interacting particles are either different from each other by their nature (say, an electron 
and a proton), or physically similar but still distinguishable because of other reasons (say, because of 
their reliable spatial separation) everything is simple - at least, conceptually. Then, as was already 
discussed in Sec. 6.7, a system of two particles, 1 and 2, each in a pure quantum state, may be described 
by a ket vector 



\ a ) = \P\*\P% 



(8-la) 

Pure state 

of 2 where the single-particle states /? and /?' are defined in different Hilbert spaces. (Below, I will frequently 

able use the following convenient shorthand, 
particles 



a 



in which the state position within a vector codes the particle number.) Hence the permuted state 

T\PP')^\P'P) = \R')MP 



(8.1b) 



(8.2) 



where V is the permutation operator, is clearly different from the initial one. 



1 An important conceptual question is why not treat one particle of such a collection as an open quantum system, 
and apply to it the powerful methods discussed in the last chapter, based on the separation of the whole Nature 
into the "system of our interest" and the "environment" - see Fig. 7.1. Such separation is very natural and works 
very well in cases when one, relative "massive" (inertial) particle, or a specific collective degree of freedom (also 
relatively inertial), is surrounded by a sea of "lighter particles", which serve the role of an environment - 
frequently in or close to thermal equilibrium. On the other hand, in most systems of identical particles, such 
separation is more artificial and may lead to errors, because the quantum state of the "particle of interest" may be 
substantially correlated (in particular, entangled) with that of similar particles of its "environment" - see the 
discussion later in this section. 
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Again, such description is valid even for identical particles if they are still distinguishable by their 
spatial separation. (The separation does not preclude particles from interacting with each other, e.g., 
electrostatically.) Such systems of similar but clearly distinguishable particles (or subsystems) are 
broadly discussed nowadays, for example in the context of quantum computing and encryption - see 
Sec. 8.5 below. This is why it is unfortunate that term "identical particles" is frequently used in the 
sense of indistinguishable particles. I will try to avoid this confusion by using the latter term, despite it 
being rather unpleasant grammatically. 

Now comes the most important experimental fact: identical elementary particles, 2 if they are not 
reliably separated, are genuinely indistinguishable, i.e. their Hilbert spaces are not separable. Hence, 
instead of Eq. (1), for a set of two particles, we need to use a linear combination of products like \J3J3') 
and \P'P) for the construction of genuine quantum states. 3 In order to comprehend what exactly linear 
combinations should be used, it is convenient to discuss properties of the permutation operator defined 
by the first of Eqs. (2). 

Let us consider an observable A, and a system of eigenstates of its operator: 



If the particles are indistinguishable indeed, the observable expectation value should not be affected by 
their permutation. Hence operators A and T 3 have to commute, and share their eigenstates. This is why 
eigenstates of operator V are so important: in particular, they are also eigenstates of the Hamiltonian, 
i.e. the stationary states of the system of particles. 

Now let us have a look at the operation described by the square of the permutation operator, on 
an elementary ket- vector product: 



2 Here by "elementary particles" I mean any of the following two options: 

(i) particles like electrons, which (at least at this stage of development of physics) are considered as 
structure-less entities; 

(ii) any object (e.g., a hadron or meson) which may be considered as a system of "more elementary" 
particles (e.g., quarks), but still may be reliably placed in a definite (say, ground) quantum state. 

From that point of view, even complex atoms or molecules of a certain chemical element, each in its 
ground state, may be considered on the same footing as elementary particles. 

3 A very legitimate question is why, in this situation, we need to introduce particle's number to start with. A 
partial answer is that in this approach it is much simpler to derive (or guess) problem Hamiltonians from the 
correspondence principle. For example for a system of two spinless particles, each in an external potential U(r), 
and with the interaction energy U- mt (\ri - r 2 |), the correct Hamiltonian is 



Later in this chapter, we will discuss an alternative approach (the so-called "second quantization") in which 
tracing a certain particle is avoided. While for indistinguishable particles this is more logical, in that approach 
writing adequate Hamiltonians (which, in particular, would avoid spurious self-interaction of the particles) is 
much more challenging - see Sec. 3 below. 




(8.3) 




(8.4) 
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i.e. V brings the state back to its original form. Since any pure state of a two-particle system may be 

represented as a linear combination of such products, this result does not depend on the state, and may 
be represented as an operator relation: 

■P 2 =I. (8.5) 

Now let us find the possible eigenvalues 7) of the permutation operator. Acting by both sides of 

Eq. (5) on any of eigenstates \aj) of the permutation operator, we get a very simple equation for its 
eigenvalues: 

^'=h (8.6) 

with two possible solutions: 

7>. =+1. (8.7) 

Let us find the eigenstates of the permutation operator in the simplest case when each of the 
component particles can be only in two single-particle states - say, /3 and /?'. Evidently, none of the 
simple products |/?/?') and \/3'/3), taken alone, does qualify for the eigenstate - unless states /? and /?' are 
identical. Let us try their linear combination 

\ aj ) = a\pp') + b\P'0), (8.8) 

so that 

'P\a j ) = -P J \a J ) = a\P'/3) + b\ffl'). (8.9) 

For the case 7j = +1 we have to require states (8) and (9) to be the same, so that a = b. Assuming also 

that the single-particle states f3 and /?' are normalized, and requiring the same for the composite state a, 
we get the so-called symmetric eigenstate* 



Symmetric 






fi'fi))- 


and anti- 
symmetric Similarly, for 7) = -1 we get a = - b, and the antisymmetric eigenstate 

entannled 


eigenstates 






P'P)\ 



(8.10) 



(8.11) 



where the front coefficients guarantee the orthonormality of the two-particle states, provided that the 
single-particle states are orthonormal. These are typical examples of entangled states, defined as multi- 
particle states whose state vectors cannot be factored into a product of single -particle vectors. 

So far, our math does not preclude either sign of 7j, in particular the possibility that the sign 
depends on the state (i.e. index J). Here, however, comes in another crucial experimental fact: all 
elementary particles fall into two groups: 5 



4 As in many situations we met before, kets (10) and (11) may be multiplied by exp{i(p} with an arbitrary real 
phase (p. However, until we discuss coherent superpositions of various states a, there is no good motivation for 
taking the phase different from 0; that would only clutter the notation. 
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(i) bosons, particles with integer spin s, for which 7} = +1 for any j, and 

(ii) fermions, particles with half- integer spin, with 7^ = -1, also for any j. 

In the non-relativistic theory we are discussing now, this key fact should be considered as 
experimental one. (The relativistic quantum theory, to be discussed in Chapter 9, offers a proof that half- 
integer-spin particles cannot be bosons and integer-spin ones cannot be fermions, but not more than 
that.) However, our discussion of spin in Sec. 5.7 allows the following interpretation of the fermion- 
boson difference. In free space, the permutation of particles 1 and 2 may be viewed as a result of 
rotation of this pair by angle ±n about a certain axis. As we have seen in Sec. 5.7, at a rotation by such 
an angle, the state vector of a particle with quantum number m s (that ranges from -s to +s , and hence 
may take only integer values for integer s, and only half-integer values for half-integer s) changes by 
factor exp{±imn s }, so that the state product \/3/3') changes by exp{±i2mn s }, i.e. by factor +1 for integer 
s, and by factor (-1) for half-integer s. 

Since eigenvalues Pj do not depend on the particular state of the system, we can write explicit 
expressions for the permutation operator: 



The most impressive corollaries of Eqs. (10) and (11) are for the case when the partial states of 
the two particles are the same: /? = /?'. The corresponding Bose state a+ is possible; in particular, at 
sufficiently low temperatures, a set of non-interacting Bose particles condenses on the ground state of 
each of them - the so-called Bose-Einstein condensate ("BEC"). 6 Its examples include superfluid fluids 
like helium, the Cooper-pair condensate in superconductors, and the BEC of weakly interacting atoms. 
Perhaps the most fascinating feature of a multiparticle Bose-Einstein condensate is that dynamics of its 
observables is governed by laws of quantum mechanics, while (for nearly all purposes) may be treated 
as c-numbers - see, e.g., Eqs. (2.54)-(2.55). 7 

On the other hand, if we take (5 = (5' in Eq. (1 1), we see that state «_ vanishes, i.e. cannot exist at 
all. This is the mathematical expression of the Pauli exclusion principle: two indistinguishable fermions 
cannot be in the same quantum state. 8 (As will be discussed below, this is true for systems with more 
than two fermions as well.) Probably, the key importance of this principle is self-evident: if it was not 
valid for electrons (that are fermions), all electrons of each atom would condense on its ground (Is) 
level, and all the usual chemistry (and biochemistry, and biology, including dear us!) would not exist. 
The Pauli principle effectively makes fermions interacting even if they do not interact directly, in the 
usual sense of this word. 



5 Traditionally, people speak about two different "statistics": the Bose-Einstein statistics of bosons, and Fermi- 
Dirac statistics of fermions, because their statistical distributions in thermal equilibrium are indeed different - see, 
e.g., SM Sec. 2.8. However, as evident from the above discussion, their difference is deeper, and actually we are 
dealing with two different quantum mechanics. 

6 For a quantitative discussion of the Bose-Einstein condensation see, e.g., SM Sec. 3.4. 

7 Such possibility follows from the fact that for the Bose-Einstein condensate of N » 1 particles, the Heisenberg 
uncertainty relation may be reduced to 8NS(p> 1, where cp is the condensate wavefunction's phase, so that it may 
have SN/{N) « 1 and Sep « 1 simultaneously. 

8 It was formulated by W. Pauli in 1925, on the basis of less general rules suggested by G. Lewis (1916), I. 
Langmuir (1919), N. Bohr (1922), and E. Stoner (1924) for the explanation of experimental spectroscopic data. 




+ 1, for bosons, 
-1, for fermions. 



(8.12) 



Chapter 8 



Page 4 of 46 



Essential Graduate Physics 



QM: Quantum Mechanics 



8.2. Singlets, triplets, and the exchange interaction 

Now let us discuss possible approaches to analysis of identical particles on a simple but very 
important example of a pair of spin- 1 /* particles (say, electrons) whose interaction with either each other 
or the external world does not involve spin. Then the ket-vector of a total state is factorable as 

(8.13) 



\a 



0 12 )® S 12/ , 



with the orbital function |oi2> and the spin function \su) (that depends on the state of both spins of the 
pair) belonging to different Hilbert spaces. It is frequently convenient to use the coordinate 
representation of such state, sometimes called the spinor. 



2-particle 
spinor 



( r i' r 2 


a_) = (r 1 ,r 2 


o X2 )® 


s l2 ) = y/{^2)\ s n)- 



(8.14) 
(8.15) 

has to change the sign of either the spin part or the orbital factor of the spinor. In the case of a 
symmetric orbital factor, 



Since spin- Vi particles are fermions, the particle permutation, 

-Py/(r x , r 2 )| s u ) = y/(r 2 ,r 1 )\s 21 ) = -y/(r x , r 2 )| s u , . 



^(r 2 ,r 1 ) = ^(r 1 ,r 2 ), 



the spin factor has to obey relation 



(8.16) 
(8.17) 



Let us use the ordinary z-basis (where z, in the absence of external magnetic field, is an arbitrary 
spatial axis) for each of the spins. In this basis, any ket-vector \m s ) of spin orientation of two particles 
may be represented as a linear combination of four single-spin basis vectors: 

ITT), \U\ |U), and |4,t 



Singlet 
state 



(8.18) 

The first two kets evidently do not satisfy Eq. (17), and cannot participate in the state. Applying to the 
remaining kets the same argumentation as has resulted in Eq. (1 1), we get 

(8.19) 

Such orbital-symmetric and spin-asymmetric state is called the singlet. The origin of this name 
becomes clear from the analysis of the opposite (orbital-asymmetric and spin-symmetric) case: 




(8.20) 



For the composition of such symmetric spin state, the first two kets of Eq. (18) are completely 
acceptable (with arbitrary weights), and so is a specific symmetric combination of two last kets, similar 
toEq. (10): 



Triplet 
states 







n)+c_ 


U) +Co -L|U) + 


It)). 



(8.21) 
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We may use this composite state with any values of coefficients c (satisfying the normalization 
condition), because they correspond to the same orbital wavefunction and hence the same energy. 
However, each of these three states has a specific value of the z-component of the net spin (respectively, 
+h, -h, and 0). 9 Because of this, an even small external magnetic field lifts their degeneracy, splitting the 
energy level in three, and giving it the natural name of triplet. 

In the particular case when the particles do not interact at all, for example 10 

n 2 

H = h Y +h 2 , h k =^ + U(r k ), it = 1,2, (8.22) 
2m 

the 2-particle Schrodinger equation for the symmetrical orbital wavefunction (16) is obviously satisfied 
by the simple product, 

V(r l ,r 2 ) = y/„(r 1 )y/ n ,(r 2 ), (8.23) 

of single-particle eigenfunctions, with arbitrary sets n, n ' of quantum numbers. For the particular (but 
very important!) case n = n\ this means that the eigenenergy of the singlet state, 

j^„(r>,> 2 ){n)-|;T)), (8.24) 

is just 2e„, where s n is the single-particle energy level. It may be proved that the lowest energy of the 
triplet state is always higher than that. Hence, for the limited (but extremely important!) goal of finding 
ground-state energies of multi-electron systems, we may ignore the actual singlet structure of spinor 
(24), and reduce the Pauli exclusion principle to the semi-qualitative picture of single-particle levels, 
each "occupied" with 2 independent particles. 

As a very simple example, let us find the ground energy of a deep, cubic-shaped, 3D quantum 
well with side a, filled with 5 fermions, ignoring their direct interaction. From the solution of the single- 
particle Schrodinger equation in Sec. 1.5, we know the single -particle energy spectrum of the system: 

7T 2 tl 2 

£ n x ,n y ,n z = S X n l +n l +n l\ Wlth £ °-^f' ^ ' ' " z = ^ 2 '" " (8>25) 

so that the lowest-energy orbital states are: 

2 2 2 

- one ground state with {n x ,n y ,n z } = {1,1,1}, and energy s\u= (1 +1 +1 )£o = 3£b, and 

- three excited states, with {n x ,n y ,n z } equal to {2,1,1}, {1,2,1}, and {1,1,2}, with equal energies 

2 2 2 

£211= £m = s\\i = (2 +1 +1 )s 0 = 6s 0 . 

According to the Pauli principle, each of these energy levels states can accommodate up to 2 
electrons. Hence the lowest-energy (ground) state of the 5-electron system is achieved by placing 2 of 



9 Note that in the sense of Eq. (5.197), all three triplet states of a two-electron system behave as a single integer 
spin with s = 1; for example, S 2 equals 2ti 2 , rather than 0 as one could expect for the last component of Eq. (21) - 
see Problem 1. 

10 In this chapter, I try to use lower-case letters for observables of single particles (in particular, s for their 
energies), in order to distinguish them as clearly as possible from system's variables, including the total energy E 
of the system, typeset in capital letters. 
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them on the ground level S\u = 3sq, and the remaining 3 particles, in any of the degenerate "excited" 
states of energy 6sq . Hence the ground energy of the system is 

E=2x3s 0 +3x6s 0 =24s 0 = . . (8.26) 

ma 

In many cases of relatively weak interaction between particles, it does not blow up such a simple 
quantum state classification scheme, and the Pauli principle allows tracing the order of single-particle 
state filling with Fermi particles. This is exactly the approach that has been used at our discussion of 
atoms in Sec. 3.7. 

Now let us describe the results of particle interaction more quantitatively, on the simplest 
example of the lowest energy states of a neutral atom of helium, 11 with the nucleus (consisting of two 
protons and two neutrons) of electric charge q = +2e, and two electrons "rotating" about it. Neglecting 
the small relativistic effects that was discussed in Sec. 6.3, the Hamiltonian describing the electron 
motion may be represented as 

H = h l+ h 2 +u mt , £ t =|*— u mt =- e - (8.27) 



2m 47rs 0 r k 4ns ^ \r x - r 



2 



As most problems of multiparticle quantum mechanics, the eigenvalue/eigenstate problem for 
this Hamiltonian does not have an exact analytical solution, so let us start an approximate analysis 
considering the electron-electron interaction as a perturbation. As was discussed in Chapter 6, we have 
to start with the "O th -order" approximation in which the perturbation is ignored, so that the Hamiltonian 
is reduced to sum (22). In this approximation, the ground state g of the atom is the singlet (24), with the 
orbital factor 

¥ g (ri , r 2 ) = y/ m Oi )¥ m (r 2 ) , (8.28) 

and energy 2s g . Here each operand ^ioo(r) is the single-particle wavefunction of the ground (Is) state of 
the hydrogen-like atom with Z = 2, with quantum numbers n= 1,1 = 0, m = 0. According to Eqs. (3.174) 
and (3.198), 

^ 00 (r) = 7 0 0 (^,^K U0 (r) = ^L4r e " r/r0 ' with ^o=v = V' < 8 - 29 ) 
so that according to Eq. (3.191), in this approximation the total ground state energy is 



E T= 2s T = 



f 






= 2 


f z 2 e^ 




V 


2n j 


n=\,Z=2 


V 2 J 


Z=2 



= -4E H *-109eV. (8.30) 



This is still somewhat far (though not terribly far!) from the experimental value E g « -78.8 eV - see the 
bottom level in Fig. la. 

We can get a much better agreement with experiment by calculating the electron interaction 
energy in the 1 st order of the perturbation theory. Indeed, in application to our system, Eq. (6.13) reads 



11 Evidently, the positive ion He +1 of such atom, with just one electron, is very well described by the hydrogen- 
like atom theory with Z= 2, whose ground-state energy, according to Eq. (3.191), is -Z 2 E H /2 = - 2E H « - 55.4 eV. 
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E f =(g|"mtk> = j« ?3r lJ« ?3r 2^*( r P r 2K„t( r P r 2)^ g ( r P r 2)• 
Plugging in Eqs. (27)-(29), we get 



< (1) =■ 

8 4n r 0 3 4tts 0 



(8.31) 



(8.32) 



1 *2 



This 6D integral may be worked out analytically, and yields (5/4)Eu, so that the corrected ground state 
energy, 



is much closer to experiment. 



-4 + - 
l 4 



E n =-74.8eV, 



(8.33) 
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44 
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Y 
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(b) 



^100 + ^n/m 



/ / 

"parahelium" "orthohelium" 

Fig. 8.1. The lowest energy levels of a helium atom: (a) experimental data and (b) a schematic structure 
of an excited state with fixed n and / in the first order of the perturbation theory. On panel (a), all 
energies are referred to that (-2E H « -55.4 eV) of the ground state of ion He +1 , so that their magnitudes 
are the (readily measurable) energies of atom's ionization starting from the corresponding bound state. 



There is still a room for improvement - that may be made, for example, using the variational 
method, 12 based on the following, very general observation. Let n be the exact, full and orthonormal set 
of stationary states of a quantum system, and use it as the basis for expansion of a normalized but 
otherwise arbitrary trial state a (defined in the same Hilbert space): 

|«) = I»), (8.34) 

n 

with the energy that may be calculated using the general (4.125): 



12 Unfortunately, this is my only chance in this course to demonstrate this general and important method, which is 
especially useful in the theory of multi-particle systems. 
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E a = (a \H\ a) = £ W n E n , where W n = \a„ | 2 > 0. 



(8.35) 



Since, by definition, the exact ground state energy E g is the lowest one of the set E n , we can use Eq. (35) 
to compose the following inequality: 



Variational 
method's 
justification 



E >YW E =E YW =E 

a i—t " g g Z—i " g 



(8.36) 



Thus, the ground state energy is always lower then (or equal to) the energy of any trial state a. Hence, if 
we make several attempts with reasonably selected trial states, we may expect the lowest of the results 
to approximate the genuine ground state energy reasonably well. 

For our particular case of a helium atom, we may try to use, as the trial state, the wavefunction 
given by Eqs. (28)-(29), but with the atomic number Z considered as an adjustable parameter Z et - < Z = 2 
rather than a fixed number. The physics behind this idea is that each the electric charge density p(r) = - 
e|^r)| 2 of each electron forms a negatively charged "cloud" that reduces the effective charge of the 
helium nuclei, as seen by another electron, to Z e fe 2 , with some Z e f < 2. As a result, the single-particle 
wavefunction spreads further in space (r 0 = r B /Z e f > r B /Z), while keeping its functional form (29) nearly 
intact. Since the kinetic energies T in system's Hamiltonian are proportional to rr/ 2 , while the potential 
energies scale as r 0 '\ we can write 



EAZ e{ ) = 



'ef 



Z=2 



+ ■ 



"ef 



u. 



1=2 



(8.37) 



Now we can use the fact that according to Eq. (3.202), for any stationary state of a hydrogen-like atom 
(just as for the classical circular motion in the Coulomb potential), (U) = 2E, and hence (T) = E - (U) = - 
E. Using Eq. (8.30), and adding the correction U g (l) = -(5/4)i?H calculated above, to the potential energy, 
we get 



EAZ e{ ) = 



z V f 

Z ef 1 + 



-8 + - 



(8.38) 



The minimum of function E g (Z e f) and the corresponding "optimal" value of Z e f are as follows: 

5^ 



(Z ef ) opt =2 1 =1.6875, (E g ) . « 

v et / opt ~ 0 \ g / mln 



32 



-2.85£ H * 



-77.5 eV . 



(8.39) 



Given the trial function crudeness, this number is in a surprisingly good agreement with experimental 
value cited above, with a difference of the order of 1%. 13 

As we have just seen, the ground level energy of the helium atom is not affected directly by 
particle indistinguishability, but the situation is different for its excited states - even the lowest ones. 
The reasonably good convergence of the perturbation theory, that we have seen for the ground state, 
tells us that we can base our analysis of wavefunctions ( y/ e ) of the lowest excited state orbitals, on 



13 This example explains why the variational method is broadly used for approximate treatment of complex 
quantum systems, despite the fact that it is based more or less intuitive guesses of trial functions, i.e. in contrast 
with the perturbation theories discussed in Chapter 6, does not guarantee asymptotically correct results in any 
certain limit. 
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products like y/mij^Wnimi^k), with n > 1. However, in order to satisfy the fermion permutation rule, 7) 
= - 1 , we have to take the orbital part of the state in an either symmetric or asymmetric form: 



We ( r i , r 2 ) = -J^ [VlOO ( r i )Wnml ( r 2 ) ± W nml ( r i )^100 ( r 2 )1 



(8.40) 



with the proper total permutation asymmetry provided by the corresponding spin part given by, 
respectively, Eq. (19) or Eq. (21), so that the upper/lower signs in Eq. (40) correspond to the 
singlet/triplet spin state. Let us calculate the expectation values of the total energy of the system in the 
first order of the perturbation theory. Plugging Eq. (40) into the 0 th order expression 



Orbital 
functions of 
orthohelium 
and 

parahelium 



E e) m =\d'r^d 3 r 2 y/* e {r x ,rjfi, + H 2 )y/ e (r,,r 2 ), 



(8.41) 



we get two groups of similar terms that differ only by the particle index. We can merge the terms of 
each pair by changing the notation as (ri — > r, Y2 — > r ' ) in one of them, and (ri — » r ', r2 — » r) in the 
other term. Using Eq. (27), and the mutual orthogonality of wavefunctions ^ioo(r) and y/ n i m (r), we get 
the following result, 



(0) 



\ ^i*oo CO 



2m 



2e 



2 A 



4ne 0 r j 



^ 100 (r)J 3 r + J^l(r') 



h 2 vl 

2m 



2e' 



47T£ 0 r'j 



3.„f 



(8.42) 



^100 + £ nml ' 



which may be interpreted as the sum of eigenenergies of two separate single particles, one in the ground 
state 100, and another in the excited state nlm - despite that actually the electron states are entangled. 
Thus, in the 0 th order of the perturbation theory, the electron entanglement does not affect their energy. 

However, the potential energy of the system also includes the interaction term U mt (27) that does 
not allow such separation. As a result, in the first approximation of the perturbation theory, the total 
energy of the system may be represented as 



E e ~ ^100 + 6 nlm + ^Lt ' 



= jd*r 1 jd 3 r 2 ¥* e {r l ,Y 2 )U int {Y l ,Y 2 )y/ e (r,,r 2 ) 



(8.43a) 
(8.43b) 



Plugging Eq. (40) into this result, using the symmetry of u mt with respect to the particle number 
permutation, and the same particle coordinate re-numbering as above, we get 



^int 



E A . ±E 

dir e 



with deceivingly similar expressions for the operands: 



£ d ,r S J^ 3 4^^Vll( r )^l( r > ln t( r ? r >i00( r K/,„( r ') 5 



(8.44) 



(8.45a) 



(8.45b) 



Direct 
and 

exchange 
interaction 
energies 



Since the single-particle orbitals can be always made real, both components are positive (or at 
least non-negative). However, their physics is completely different. Integral (45a), called the direct 
electron-electron interaction, allows a simple semi-classical interpretation as the Coulomb energy of 
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interacting electrons, each distributed in space with the electric charge density ycw(r) = 
e^m/*(r)yw(r): 14 



E ik = \d'r\dh' P^^ = \ p m {r)md\ (8.46) 



where $r) is the electrostatic potential created, at point r, by the counterpart electron's "electric charge 
cloud": 15 



g)(r) = — i— f d 3 r' P " lm ^} . (8.47) 

Attp J i* — r'\ 



4ff£ 0 

However, integral (45b), called the exchange interaction, evades a classical interpretation, and 
(as it is clear from its derivation) is the direct corollary of the fact that two electrons of the atom are 
indistinguishable. The magnitude of E ex is also very much different from E^, because the function under 
integral (45b) disappears in those regions where single-particle wavefunctions do not overlap. This is in 
a full agreement with the discussion in Sec. 1: if two particles are identical but well separated, i.e. their 
wavefunctions do not overlap, the exchange interaction disappears, because all effects of particle 
nondistinguishability vanish. 

Historically, the fact of having two different hydrogen-like spectra (48) and (49) was taken as an 
evidence for two different species of helium, called, respectively, the orthohelium and parahelium. 
Figure lb shows the structure of an excited energy level, with certain quantum numbers n > 1, /, and m, 
given by Eqs. (44)-(45). The upper level, with energy 

-^ortho = (^100 + £ nlm ) + ^dir + £«>*ioo +*■*,> (8.48) 

corresponds to the "orthohelium", i.e. the symmetric orbital state and hence to the singlet spin state (19), 
with zero net spin, s = 0. The lower level, with 

^para = (^100 + S nlm ) + ^dir ~~ ^ex < ^ortho ' (8.49) 

corresponds to "parahelium", i.e. the antisymmetric orbital, and hence to the triplet spin state(s) with s = 
1 - see Eq. (21). Its degeneracy may be lifted by magnetic field, so that the splitting is identical to that 
of an elementary particle with spin 5=1. Calculations of the direct and exchange interaction integrals 
(45) for various values of n and / show that the perturbation theory explains the experimental spectrum 
of the orthohelium and parahelium (Fig. 1) pretty well. 

Encouraged by this success, and motivation by the very important task of description of atoms, 
molecules, and metals, we may try to apply the same approach to systems with N> 2 electrons. In this 
case the mathematical expression of the Pauli principle for fermions is 

for all k,k' = 1,2,..., N., (8.50) 



14 See, e.g., EM Sec. 1.3, in particular Eq. (1.54). 

15 Note that the result for isdir correctly reflects the basic fact that a charged particle does not interacts with itself, 
even if its wavefunction is quantum-mechanically spread over a finite space volume. Unfortunately, this is not 
true for some other approximate theories of multi-particle systems - see Sec. 4 below. 
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where operator P kk , permutes particle with numbers k and k'. In order to understand how common 

eigenstates of all such operators may be formed, let us return for a minute to two non-interacting 
electrons, and rewrite Eq. (1 1) in the following compact form: 



\a 



state 1 
i 

P)® 
P)® 



state 2 
I 

\P' 



(8.51) 



<— particle number 1 , 
<— particle number 2. 



In this way, the Pauli principle is mapped on the well-known property of matrix determinants: if any of 
two columns of a matrix coincide, its determinant vanishes. This Slater determinant approach may be 
readily generalized to A" fermions in iV (not necessarily lowest) single-particle states /?, /?', etc: 



\a 



state list — > 

p)® \P')® \p")® 
p)® \p')® \P"}® 
p)® \P')® \p")® 

N 



particle 
N list 
i 



(8.52) 



Slater 

determinant 



Even though the Slater determinant form is extremely nice and compact (in comparison with 
direct writing of a sum of Nl products, each of A^ket factors), there are two major problems with using it 
for practical calculations: 

(i) For the calculation of any bra-ket product (say, within the perturbation theory) we need to 
spell out each bra- and ket-vector as a sum of component terms. Even for a limited number of electrons 
(say N ~ 10 2 in a typical atom), the number Nl ~ 10 160 of terms in such a sum is impracticably large for 
any analytical calculation. 

(ii) In the case of interacting fermions, Slater determinants do not describe the eigenvectors of 
the system; rather the stationary state is a superposition of such determinants - each for a specific 
selection of A^ states from the general set of single-particle states - that is generally different from N. 

These challenges make the development of a more general theory that would not use particle 
numbers (which are superficial for indistinguishable particles to start with) a must for getting any final 
results for multiparticle systems. 



8.3. Second quantization 

The most useful formalism for this purpose, that avoids particle numbering at all, is called the 
second quantization. 16 Actually, we have already discussed a particular version of this formalism, for 
the a case of ID harmonic oscillator's excitations, in Sec. 5.4. As a reminder, we have used Eqs. (5.98) 



16 It was invented (first for photons and then for arbitrary bosons) by P. Dirac in 1927, and then modified in 1928 
for fermions by E. Wigner and P. Jordan. The term "second quantization" is rather misleading for the 
nonrelativistic applications we are discussing, but finds certain justification in the quantum field theory. 
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to define the "creation" and "annihilation" operators via the usual operators of coordinate and 
momentum, and then proved their key property (5.122), 



a 'In 



(« + l) 1/2 |n + l), a\n 



.1/21 



n-\), 



(8.53) 



where n are the stationary (Fock) states of the oscillator. This property allows an interpretation of 
operators' actions as the creation/annihilation of a single excitations of energy hah - thus justifying the 
operator names. In the next chapter, we will show that such an excitation of an electromagnetic field 
mode may be considered as a massless boson with s=\, called the photon. 

In order to generalize this approach to arbitrary bosons, not appealing to a specific system such 
as the harmonic oscillator, we may use relations similar to Eq. (53) to define the creation and 
annihilation operators. The definition looks simple in the language of the so-called Dirac states, with 
ket-vectors 



Dirac 
state 



N x ,N 2 ,...N r , 



(8.54) 



where Nj are the state occupancies, i.e. the numbers of bosons in each single-particle state j. Let me 
emphasize that here indices 1,2, ...j,..., are the positions of each number in the Dirac ket vector, i.e. are 
the numbers of single-particle states (including their spin parts) rather than particles. Thus the very 
notion of individual particle numbers is completely (and for indistinguishable particles, very relevantly) 
absent from this formalism. Generally, the set of single-particle states participating in the Dirac state 
may be selected in an arbitrary way (provided that it is full and orthonormal), 



N l ,N 2 ...,N J „...\N l ,N 2 ...,N J ,...) = S NiN ,S N ^ NV ..S 



'2" 2 



(8.55) 



though for system of non- (or weakly) interacting bosons, using the stationary states of individual 
particles in the system under analysis are almost always the best choice. 



Now we can define the particle annihilation operator as follows: 



Boson 
annihilation 
operator 



Oj \N l ,N 2 ,...N j: 



N)' 2 \N,,N- 



(8.56) 



Note that the pre-ket coefficient, similar to that in Eq. (53), guarantees that an attempt to annihilate a 
particle in an unpopulated state gives the non-existing (null) state: 



a, \ N l ,N 2 ,...0 



0, 



(8.57) 



where symbol 0, means zero occupancy of y'-th state. An alternative way to write Eq. (56) is 



N l ,N 2 ,...,N j ,...\a J \.N 1 ,N 2 ,..,N J ,...) = N) l2 S NN ,S 



^ x 'N 2 N' 2 ~ S N' j ,N j -V 
at 



According to Eq. (4.65), the matrix element of the Hermitian conjugate operator aj is 



(8.58) 



N v N 2 ,...,Nj, 



N l ,N 2 ,...,N J ., 



a]\N l ,N 2 ,...N J ,..) = (N„N 2 ,...,N J ,...\a J N[,N' 2 ,..,N],. 



^T\Kn' 2 ,.,n)-i,.) = (nTs_ 



= (n.+i)' 2 S n ^ n ,5 n ^...S 



N X N\ S N 2 N' 2 - S Nj,N' r \- ( 8 - 59 ) 
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meaning that 



in the total compliance with the first of Eqs. (53). In particular, this particle creation operator a ! allows 
the description of the generation of a single particle from the vacuum (not null!) state |0,0,...) : 

flj|0,0,... 5 0.,...,0) = |0,0,...,l.,...0), (8.61) 

and hence a product of such operators may create, from the vacuum, a multiparticle state with an 
arbitrary set of occupancies: 17 

aM t --444--4---|0,0,...) = (7V 1 !7V 2 !...) 1/2 |7V 1 ,iV 2 ,...). (8 . 62) 

N, times N 2 times 

Next, combining Eqs. (56) and (60), we get 

a]a J \N i ,N 2 ,..J^ Js ..) = Nj\N l ,N 2 ,...,N Js ..^ (8.63) 
so that, just as for the particular case of harmonic oscillator excitations, operator 

Number- 
(8.64) counting 
operator 

conserves the numbers of particles in all single-particle states, and simultaneously "counts" their number 
in the y'-th state. Acting by the creation-annihilation operators in the reverse order, we get 

^flJ|jV ls JV 2s ...,JV y> ...) = (^+l)|^ ls JV 2J ...,JV yj ...). (8.65) 

This result shows that for any state of a multiparticle system (which always may be represented as a 
linear superposition of Dirac states with different sets of Nj), we can write 

= /, (8.66) 

again in agreement with what we had for the ID oscillator - cf. Eq. (5.101). According to Eq. (55), the 
creation and annihilation operators corresponding to different single-particle states do commute, so that 
Eq. (66) may be generalized as 

Commutation 
relations for 
(8.67) bosonic 
operators 

and that similar bosonic creation and annihilation operators commute, regardless of which states do they 
act upon: 



Boson 
(8.60) creation 
operator 



Nj = did j 



d.dj - did j = 



a j, a] 




17 The resulting Dirac state is not an eigenstate of every multiparticle Hamiltonian. However, we will see below 
that for a set of non-interacting particles it is an eigenstate, and thus may be used in the basis for perturbation 
theories of systems of weakly interacting particles. 
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a) ,a), 



a j, a r 



= 0. 



(8.68) 



Relations (66)-(68) are the mathematical expression of the independence of occupancies of different 
boson states. 

As was mentioned earlier, a major challenge in the Dirac approach is to rewrite the Hamiltonian 
of a multiparticle system, that naturally carries particle numbers k (see, e.g., Eq. (22) for k = 1, 2), in the 
second quantization language, in which there are no these numbers. Let us start with single-particle 
components of such Hamiltonians, i.e. operators of the type 



Single- 
particle 
operator 




(8.69) 



where all TV operators f k are similar, besides that each of them acts on one specific (&-th) particle, and N 

is the total number of particles in the system, that is naturally equal to the sum of single-particle state 
occupancies: 

N = Y,N r (8.70) 



The most important examples of such operators are the kinetic energy of N similar single particles, and 
their potential energy in an external field: 



y Pk 

k=i 2m 



(8.71) 



A=l 



In order to express a particle-separable operator of the type (69) in terms of the Dirac formalism, 
we need to return for a minute to the particle-number representations used in the beginning of this 
chapter. Instead of the Slater determinant (52), for bosons we have to write a similar expression, but 
without the sign changes (sometimes called the permanent): 



\N lt ..Jfj, 



M 



.1/2 



.jpjsr.. 

N operands 



(8.72) 



Note again that the left-hand part of this relation is written in the Dirac notation (that does not 
use particle numbering), while in its right-hand part, just in relations of Sees. 1-2, particle numbers are 
coded with the positions of the single-particle states inside the ket-vectors, and the sum is over all 
different permutations of the states in the ket - cf. Eq. (10). (According to the elementary 
combinatorics, 18 there are N\/(Ni\...Njl...) such permutations, so that the coefficient before the sum 
ensures the proper normalization of the single-particle states.) Let us use Eq. (72) to spell out the 
following bra-ket of a system with (N -\) particles: 



JV..,..JV„ -I...IF L.JV, -L..JV,.. 



N 1 \...(N.-l)l...(N.,-l)\..., w . _ _. 



(8.73) 



P(N-\\ P\N-1) 



k = \ 



18 See, e.g.,MAEq. (2.3). 
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where all non-specified occupation numbers in the corresponding positions of the bra- and ket-vectors 

are equal to each other. Each single-particle operator f k , participating in the operator sum, acts on the 

bra- and ket-vectors of states j and j ', respectively, in a certain (say, k th ) position, giving the result that 
does not depend on the position number: 

\Pj I in k th position ^ ,£ 1 &f /in k th position = (&J \A Pf ) s fjV" ^- 74j 

Since in both permutation sets participating in Eq. (73), with (TV - 1) vectors each, all positions are 
equivalent, we can fix the position (say, take the first one) and replace the sum over k by the 
multiplication by factor (TV - 1). The fraction of permutations with the necessary bra-vector (with 
number j ) in that position is TV//(TV- 1), while that with the necessary ket-vector (with number j') in the 
same position in TV 7 -/(TV- 1). As the result, the permutation sum in Eq. (73) reduces to 

TV-1TV-1 P(N-2\P\N-2) 

where our specific position k is now excluded from both the bra- and ket-vector permutations. Each of 
these permutations now includes only (TV, - 1) states j and (TV,- - 1) states j', so that, using the state 
orthonormality, we finally arrive at a very simple result: 



.N J ,...N J ,-\,...\F\...N J -\,...N J „ 



(8.76) 



TVI..(TV,-1)!...(TV,-1)!.../ V/2 TV, TV, (TV-2)! 

= 1 1 —— -J- — TV,TV, f 2 (TV - 1) — J - J —f„, K } 

(TV-1)! v 3 ] ! TV-1 TV-1 TVj!...(TV 7 - -l)!...(TV y , -1)!... 

= (^-)%- 

Now let us calculate matrix elements of the following operator: 

(8.77) 

j.f 

A direct application of Eqs. (56) and (60) shows that the only nonvanishing of them are 

-l,...|^,aj"a r |...iV. -l,...,iV,„...) = (iV^.,) 1 ^^-- (8-78) 



But this is exactly the last form of Eq. (76), so that in the basis of Dirac states, operator (69) may be 
represented as Singie- 




(8.79) 

This beautifully simple equation is the most important formula of the second quantization theory, 
and is essentially the Dirac-language analog of Eq. (4.59) of the single-particle quantum mechanics. 
Each term of the sum may be described by a very simple mnemonic rule: if an operator "connects" two 
single-particle states j and j ', move the particle from state j ' into state j, and weigh the result with the 
corresponding single-particle matrix element. (One of the corollaries of Eq. (79) is that the expectation 
value of an operator whose eigenstates coincide with the Dirac states, is 



particle 
operator: 
second- 
quantization 
form 
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F 



.N , 



\F\...N 1 



(8.80) 



with an evident physical interpretation as the sum of single-particle expectation values over all states, 
weighed by state occupancies.) 

Proceeding to fermions, which have to obey the Pauli principle, we immediately notice that any 
occupation number Nj may only take two values, 0 or 1 . In order to account for that, and also make the 
key equation (76) valid for fermions as well, the creation-annihilation operators are now defined by 
relations 



Fermion 
creation- 
annihilation 
operators 



a. 


N V N 2 ,. 


,0j,.. 


) = o, 


a^N^N^...,!.,. 


.) = (-lf^' v\n u n 2 ,...,o p ..), 


a) 


N X ,N 2 ,. 


..,o,,.. 


H- 


lf (lj ' 4) iV 1 ,iV 2 ,. 





(8.81) 
(8.82) 



In these relations, symbol S(J, J*) means the sum of all occupancy numbers in state positions from J to 
J', including the border points: 



l(J,J')^N j; 



(8.83) 



so that the sum participating in Eqs. (81) and (82) is the total occupancy of all states with the numbers 
below j. (The states have to be numbered in a fixed albeit arbitrary order.) As a result, Eqs. (81)-(82) 
may be readily summarized in the verbal form: if an operator replaces the j state occupancy with the 
opposite one (1 with 0, or vice versa), it also changes sign before the result if (and only if) the total 
number of particles in states with j" <j is odd. 

One of corollaries of this (somewhat counter-intuitive) rule of sign alternation is that the sign of 
the ket-vector of a completely filled two-state system depends on how exactly it has been formed from 
the vacuum state. Indeed, if we start from creating the fermion in state 1, we get 



at 1 0, 0) = (-1)° 1 1, 0) = 1 1, 0), a\a} | 0, 0) = aj | 1, 0) = (-1) 1 1 1, l) = -| 1, 1 
while if the operator order is different, the result's sign is opposite: 

a\ 1 0, 0) = (- 1)° 1 0, 1) = 1 0, 1), a[a\ 1 0, 0) = a} 1 0, l) = (-1)° 1 1, l) = +1 1, 1 



(8.84) 



(8.85) 



Since the action of any of these operator products on any initial state rather than vacuum gives the null 
ket, we can write the following operator equality: 



fli a 2 + a 2 a{ 



at st 

d-^ ^ CI 2 



0. 



(8.86) 



It is straightforward to check that this result is valid for the Dirac vector of an arbitrary length, and does 
not depend on the occupancy of other states, so that we can always write 



a) ,a), 



a. ,d f } = 0; 



(8.87) 
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these equalities hold for j = j' as well. On the other hand, the absolutely similar calculation shows that 
the mixed creation-annihilation operator products do depend on whether the states are different or not: 19 



(8.88) 



These equations look very much like Eqs. (67)-(68) for bosons, "only" with the replacement of 
commutators with anticommutators. Since the core laws of quantum mechanics, including the operator 
compatibility (Sec. 4.5) and the Heisenberg equation (4.199) of operator evolution in time, involve 
commutators rather than anticommutators, so that one might think that all the behavior of bosonic and 
fermionic multiparticle systems should be dramatically different. However, the difference is not as huge 
as one could expect, for one, a straightforward check shows that the sign factors in Eqs. (81)-(82) 
compensate those in the Slater determinant, and make the key relation (79) valid for the fermions as 
well. (Indeed, this is the very goal of the introduction of these factors.) 

As the simplest example, let us examine what does the second quantization formalism say about 
dynamics of non-interacting particles in the system whose single-particle properties we know well, 
namely two nearly-similar, coupled quantum wells - see Fig. 2.23. If the coupling (tunneling) between 
the wells is so small that the states localized in the wells are only weakly perturbed, in the basis of these 
states, the single-particle Hamiltonian of the system may be represented by 2x2 matrix (6.27). Selecting 
the origin of energy at the middle between energies of unperturbed states, so that coefficient ao in Eq. 
(6.27) vanishes, we can reduce the matrix to 



Commutation 
relations for 
fermionic 
operators 



h = a a = 



a, = a+ ia. 



with eigenvalues 



11/2 2 2 V 2 
s ± =+a, a = a = \a x + a y + a z ) . 



(8.89) 



(8.90) 



Now following recipe (79), we can represent the Hamiltonian of the whole system of particles in terms 
of the creation-annihilation operators: 



H = a z a x flj + a_a\ a 2 + a + d 2 ] a x - a z a 2 a 2 , 



(8.91) 



where a\ 2 and a x 2 are the operators of creation and annihilation of a particle localized in the 

corresponding quantum well. According to Eq. (64), the first and the last terms of the right-hand part of 
Eq. (91) describe particle energies in uncoupled wells, 



a z a{ a x = s x N x , 



CI „ CI 2 CI ^ £ 2 "^^2 ' 



(8.92) 



while the sum of middle two terms is the second-quantization description of tunneling between the 
wells. 

Now we can use the general Eq. (4.199) of the Heisenberg picture to find the equations of 
motion for the creation-annihilation operators. For example, 



19 A by-product of this calculation is a proof that operator (57) counts the number of particles TV/ (now equal to 
either 1 or 0), just at it does for bosons. 
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ihd x = 



fl!,iy] = . 



+ a 



+ a. 



aj,a 2 ' a \ 



(8.93) 



Since the Bose and Fermi operators satisfy different commutation relations, one could expect the right 
hand part of this equation would be different for bosons and fermions. However, it is not so. Indeed, all 
commutators in the right-hand part of Eq. (93) have the following form: 



dj,dj,djn 



= djdtdj,, -dtd.Mj. 



According to Eqs. (67) and (88), the first pair product of the operators may be recast as 

a, at = IS,-,, ±ata\, 



(8.94) 



(8.95) 



where the upper sign pertains to bosons and the lower to fermions, while according to Eqs. (68) and 
(87), the very last pair product is 



a j „a j =±a j a j „, 



(8.96) 



with the same sign convention. Plugging these expressions into Eq. (94), we see that regardless of the 
particle statistics, two last terms cancel, and we arrive at a universal (and generally very useful) 
commutation rule 



a p a].a r 



= d j „S jf , 



(8.97) 



valid for particles of both kinds. As a result, the Heisenberg equation of motion for operator d i , and the 
equation for d 2 (that may be obtained absolutely similarly), are also statistics-independent: 20 



ihd x = a z d x + a_d 



2» 



ihd 2 = a + d x -a z d 2 . 



(8.98) 



Thus we have got a system of coupled, linear differential equations that are identical to 
equations for the c-number probability amplitudes of single-particle wave functions of a two-level 
system - see Eq. (2.201) and Problem 4.10. Their general solution is a linear superposition of 
exponents: 



(8.99) 



As usual, in order to find exponents A± , it is sufficient to plug in a particular solution 
d l2 (t) = Cj 2 exp{/ii} into Eq. (98) and require that the determinant of the resulting homogeneous, linear 

system for "coefficients" (actually, time-independent operators) c x 2 equals zero. This gives us the 
following characteristic equation 



20 Equations of motion for creation operators a l2 are just the Hermitian-conjugates of Eqs. (98), and do not add 
any new information about system's dynamics. 
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a. —ifiX 



■ ifiX 



= 0. 



(8.100) 



with two roots X± = ±iDJ2, where Q = 2a/h. Now plugging each of the roots, one by one, into the system 
of equations for c l2 , we can find these operators, and hence the general solution of system (98) for 
arbitrary initial conditions. 

Let us consider the simple case a y = a z = 0 (meaning in particular that the well eigenenergies are 
exactly aligned), so that frQ/2 = a = a x ; then the solution of Eq. (98) is 

Qt Qt „ Qt Qt 

a x (t) = a!(0)cos — -z'a 2 (0)sin — , a 2 (t) = -za^O^in — + a 2 (0)cos — . (8.101) 

Multiplying the first of Eqs. (101) by its Hermitian conjugate, and ensemble-averaging the result, we get 



N, 



- / ^(t)&y(t)\ = (&l(0)%(0)\cos 2 -y + /aj(0)a 2 (0)\sin 2 -y 



i( a} (0)a 2 (0) + a\ (0)flj (0) } sin — cos 



Qt 



Quantum 
oscillations: 
(8.102) second 

quantization 
form 



Let us consider the particular case when the initial state of the system is a Dirac state, i.e. has a 
definite number of particles in each well; in this case only two first terms in the right hand part are 
different from zero: 21 



Qt 



Qt 



N t (0) cos 2 — + N 2 (0) sin z — 



(8.103) 



For one particle, initially placed in either well, this gives us our old result (2.185) describing quantum 
oscillations of the particle between two wells with frequency Q. However, Eq. (103) is valid for any set 
of initial occupancies; let us use it. For example, starting from two particles, with initially one particle in 
each well, we get (N\) = 1, regardless of time. So, the occupancies do not oscillate, and no experiment 
may detect the quantum oscillations, though their frequency Q is still formally present in the time 
evolution equations. This fact may be interpreted as the simultaneous quantum oscillations of two 
particles exactly in anti-phase. For bosons, we can go to even larger occupancies by preparing the 
system, for example, in the state with M(0) = N, Nz(0) = 0. Equation (103) says that in this case we see 
that the quantum oscillation amplitude increases TV-fold; this is a particular manifestation of the general 
fact that bosons can be (and evolve in time) in the same quantum state. On the other hand, for fermions 
we cannot increase initial occupancies beyond 1, so that the largest oscillation amplitude we can get is if 
we initially fill just one well. 

The Dirac approach may be readily generalized to more complex systems. For example, an 
arbitrary system of quantum wells with weak tunneling coupling between the adjacent wells may be 
described by Hamiltonian 



h = Y J £ J a j^j + +hx -' 

j {j,r) 



(8.104) 



21 For the second well's occupancy, the result is complementary, 7^(0 
particular a good sanity check: N x (t) + N 2 (t) = Ni(0) + N 2 (0) = const. 



M(0)sin 2 Q? + N 2 (0)cos 2 Qt , giving in 
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where symbol means that the second sum is restricted to pairs of next-neighbor wells - see, e.g., 
Eq. (2.203) and its discussion. Note that this Hamiltonian is still a quadratic form of the creation- 
annihilation operators, so the Heisenberg-picture equations of motion of these operators are linear, and 
its exact solutions, though possibly cumbersome, may be studied in detail. Due to this fact, Hamiltonian 
(104) is widely used for the study of some phenomena, for example the very interesting Anderson 
localization effect, in which a random distribution of eigenenergies Sj prevents particles within certain 
energy range from spreading to unlimited distances. 22 



8.4. Perturbative approaches 

The situation becomes much more difficult if the problem requires an account of direct 
interactions between the particles. Let us assume that the interaction may be reduced to that between 
pairs - as it is the case at their Coulomb interaction 23 and most other interactions, so that it may be 
described with the following "pair-interaction" Hamiltonian 



(8.105) 



Pair- 
interaction w ith the front factor of V% compensating the double-counting of each particle pair. The translation of this 

aiT " in two operator to the second-quantization form may be done absolutely similarly to the derivation of Eq. (77), 

alternative an d gives a similar (though naturally more bulky) result 24 

forms 




u mt =- 



JJ'JJ' 



where the two-particle matrix elements are defined similarly to Eq. (74): 



(8.106) 



(8.107) 



Even in this case, the resulting Heisenberg equations of motion are nonlinear, so that solving 
them and calculating observables from the results is usually impossible, at least analytically. The only 
case when some general results may be obtained is the weak interaction limit. In this case the 
unperturbed Hamiltonian contains only single -particle terms such as in Eqs. (71), so we can always (at 
least conceptually :-) find such a basis of orthonormal single-particle states in which that Hamiltonian 
is diagonal in the Dirac representation: 



£co) = £ 



(0) *t * 

£j 'ajaj 



(8.108) 



Now we can use Eq. (6.13) in this basis to calculate the interaction energy as a first-order perturbation: 



22 For a review of the ID version of this problem, see, e.g., J. Pendry, Adv. Phys. 43, 461 (1994). 

23 Another important example is the so-called Hubbard model in which there may be only two particles on each of 
localized sites, with the negligible interaction of particles on different sites - which are only connected by the 
next-neighbor tunneling - see Eq. (104). 

24 The only new feature is a specific order of the indices of the creation operators. Note the mnemonic rule of 
writing this expression, similar to that for Eq. (79): each term corresponds to moving a pair of particles from 
states / and / 'to states j" and j, factored with the corresponding two-particle matrix element (107). 
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E^=(N u N 2t ..Jfi^\N lt N 2t ..) = ^(N u N 2t ^ ^Uj^afaa^N. 



1 JJ'JJ' 



jJ'JJ' 

2 >•• 



2'- 

(8.109) 



Since, according to Eqs. (81)-(82), the Dirac states with different occupancies are orthogonal, the last 
average yields nonvanishing results only for three particular subsets of the indices: 

(i) j ±j',l =j, and /' = /". In this case the 4-operator product in Eq. (109) equals d]d],d r a p and 

applying the commutation rules twice, we can bring it to the so-called normal ordering, with each 
creation operator standing to the right of the corresponding annihilation operator, thus forming the 
particle number operator (64): 

d^d^j,d r d j =±djdj.djd r = +dj^+djdj, jd f = aja =N / N j ,, (8.110) 

with the similar sign of the final result for bosons and fermions. 

(ii) j =j ', and l'=j. In this case the 4-operator product equals at did d,, , and bringing it to 
the formA^ A^., requires only one commutation: 

djd^djdj, = a\^± a jd^j, jdj, =+d^d j d^,d j , =±N j N j ,, (8.111) 

with the upper sign for bosons and lower sign for fermions. 

(iii) All indices equal to each other, giving d^ j d^ j ,d l ,d l = a jaja^. . For fermions, such operator 

(that "tries" to create or kill two particles in a row, in the same state) immediately gives the null vector. 
In the case of bosons, we may use Eq. (66) to commute the internal pair of operators, getting 

djdjdjdj = djfdjd) -Ijdj = Nj(Nj -/). (8.112) 

Note, however, that this formula formally covers the fermion case as well (always giving zero). As a 
result, Eq. (109) may be rewritten in the following universal form: 

Particle 
interaction: 
(8.113) 1 s, -order 
energy 
perturbation 



r\ —, J J \ JJJJ JJJJ I ry 

L hi' L j 



The consequences of this result are very different for bosons and fermions. In the former case, 
the last term usually dominates, because the matrix elements (107) are typically the largest when all 
basis functions coincide. Note that this term allows a very simple interpretation: the number of the 
diagonal matrix elements it sums up for each state (j) is just the number of interacting particle pairs 
residing in that state. 

In contrast, for fermions the last term is zero, and the interaction energy is the difference of two 
terms inside the first parentheses. In order to spell them out, let us consider the case when there is no 
direct spin-orbit interaction. Then vectors \/3}j of the single-particle state basis may be represented as 
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products \o)j®\m s )j of their orbital and spin orientation parts. For spin-!/2 particles (say, electrons), these 
orientations m s may equal only +1/2 and -1/2; in this case the spin part of bra-ket ujj-jj- equals 



mJ ® km 



m„)®\m. 



(8.114) 

where, as in the general Eq. (107), the position of a particular vector in a product codes the particle 
number. Now using the fact that electron spins are defined in different Hilbert spaces, we may move 
their vectors around to get 

(m s | ® (m' s || m s ) <8> | m' s } = {{m s \ m s )\ x [(m' s I m^) = 1 , (8.115) 

for any pair of j and j '. On the other hand, Ujfyj is proportional to 

(m s | ® ( m ; \m s )®\m s ) = ((m, \ m s )\ x ({m' s \ m s )\ = S m/n , . (8.116) 

In this case, it is convenient to rewrite Eq. (113) in the coordinate representation, using single- 
particle wavefunctions called spin-orbitals 



Spin- 
orbital 
function 









m s)j- 



(8.117) 



They differ from the "usual" orbital wavefunctions of the type (5.19) only by that their index j should be 
understood as the set of the orbital state index and the spin orientation index m s . 25 Also, due to the Pauli- 
principle restriction of numbers /V} to either 0 or 1, Eq. (113) may be also rewritten without the 
occupancy numbers, with the understanding that the summation is extended only over the pairs of 
occupied states. As a result, Eq. (113) becomes 



Energy 
correction 
due to 

fermion 
interaction 




¥j (r)y> ( r 'Kt ( r > r ')Wj (r)V r (r ') 
- W* ( r >int ( r > r ')¥ r (r)yfj (r ') 



(8.118) 



If, for a system of 2 electrons, we limit the summation to 2 states = 1,2), we get the result 
absolutely similar to Eqs. (44)-(45), with the minus sign in Eq. (44). Hence, Eq. (118) may be 
considered as the generalization of the direct and exchange interaction balance picture to an arbitrary 
number of orbitals and arbitrary total number N of electrons. Note, however, that this equation cannot 
correctly describe the energy of the excited singlet state, corresponding to the plus sign in Eq. (44). 26 
The reason is that the description of entangled spin states, given by Eq. (19) and the last term of Eq. 
(21), require linear superpositions of different Dirac states, and hence not covered by our assumption 
(108). 



25 Constructs (117) are also close to spinors (14), besides that the spin s of a single particle is fixed, so that the 
spin-orbital should be indexed by spin's orientation m s rather than the full spin s. Also, the orbital index should be 
clearly distinguished from j (which, again, is the set of that index and m s ). This is why I believe that the 
frequently met notation of spinors as y/j lS (r) may lead to confusion. 

26 Note that due to condition j' and Eq. (116), the exchange interaction is limited to electron state pairs with 
the same spin direction - again in a good correspondence with the triplet states (like TT or -l-l) of a two-electron 
system, in which the contribution ofiiex (8.45b) to the total energy is also negative. 



Chapter 8 



Page 23 of 46 



Essential Graduate Physics 



QM: Quantum Mechanics 



Now comes a very important fact: the approximate result (118), added to the sum of unperturbed 



energies £f\ equals the sum of exact eigenenergies of the following Hartree-Fock equation: 21 



[-- 

y 2m 


V 2 +w(r) 

J 


(r , r ')y/j {r)y/ f (r ') - y/*, (r > mt (r , r ')y/ f (r)^ y (r) 


d 3 r' =£jWj(r), 



Hartree- 
(8.119) Fock 

equation 



where u(r) is the external- field potential acting on each particle separately - see Eq. (71). An advantage 
of this equation in comparison with Eq. (118) is that it allows the (approximate) calculation of not only 
the energy of the system, but also the corresponding spin-orbitals, taking into account the electron- 
electron interaction. 

In the limit when the single-particle wavefunction overlaps are small and hence the exchange 
interaction is negligible, the last term in square brackets may be ignored, term y/j(r) may be taken out of 
the integral, and becomes similar to the single-particle Schrodinger equation with the following effective 
potential 



J*J 



(8.120) approximation 



This is the so-called Hartree approximation - that gives reasonable results for some systems. However, 
in dense electrons systems (such as typical atoms, molecules, and condensed matter) the exchange 
interaction, described by the second term in the square brackets of Eq. (119), is of the order of 30% of 
the direct interaction, and frequently this effect cannot be ignored. In this case, Eq. (119) is an integro- 
differential rather than just differential equation. 

There are efficient methods of numerical solution of such equations, typically based on iterative 
methods, though they require large memory and CPU-cycle resources even for systems of -10 
electrons. 28 This is why the Hartree-Fock approximation is the de-facto baseline of all so-called ab-initio 
("first-principle") calculations in condensed matter physics and quantum chemistry. 29 In departures from 
this baseline, there are two opposite trends. For larger accuracy (and typically smaller systems), several 
"post-Hartree-Fock methods", notably including the configuration interaction method, 30 that are more 
complex but may provide higher accuracy, have been developed. 

There is also a strong opposite trend of extending ab-initio methods to larger systems, while 
sacrificing result accuracy. This trend is currently dominated by the Density Functional Theory? 1 



27 This equation was suggested in 1929 by D. Hartree for the direct interaction, and extended to the exchange 
interaction by V. Fock in 1930. In order to verify its equivalence to Eq. (118), it is sufficient to multiply all terms 
of Eq. (119) by y/*j(r), integrate them over all r space (so that the right-hand part would give s]), and then sum 
these single-particle energies over all occupied states j. 

28 Surprisingly, this is sufficient to describe, with reasonable accuracy, many properties of condensed matter, by 
breaking it to similar elementary spatial cells (say, Bravais cells of crystals), with cyclic boundary conditions and 
a limited number of electrons in each cell. 

29 See, e.g., A. Szabo and N. Ostlund, Modern Quantum Chemistry, McGraw-Hill, 1989. 

30 That method, in particular, allows the calculation of proper linear superpositions of the Dirac states (such as the 
excited singlet state for N=2, discussed above) which are missing in the generic Hartree-Fock approach. 

31 It was developed by W. Kohn and coauthors in the mid-1960s, and eventually (in 1998) awarded with a Nobel 
prize in chemistry. 
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universally known by its acronym DFT. In this approach, the equation solved for each eigenfunction 
y/j{r) is a differential, Schrodinger-like Kohn-Sham equation 



2m 



V z +u(r) + u™(r)-u xc (r) 



Kohn- 
Sham 
equation 
and its 
components 



where 



u™(r) = -e<t>(r\ </>(r) = f d"r'-^\, p(r) = -en(r) 

47i£ n J r-r 



and n(r) is the total electron density in a particular point, calculated as 



(8.121) 



(8.122) 



(8.123) 



The effective exchange-correlation potential wxc(r) (that differs from the genuine exchange 
potential, participating in Eq. (121), by the inclusion of term j = j') is calculated in various 
approximations, most valid only asymptotically in the limit when the electron number is high. The 
simplest of them is the Local Density Approximation (LDA) in which the effective exchange potential at 
each point is a function only of the electron density (123) at the same point, taken from the theory of a 
uniform gas of free electrons. 32 Another simplification, that dramatically cuts the computing resources 
necessary for systems of relatively heavy atoms, is the exclusion of the filled internal electron shells (see 
Sec. 3.7) from the explicit calculations, due to the fact that the shell states are virtually unperturbed by 
the valence electron effects involved in typical atomic phenomena and chemical reactions. In this 
approach, the Coulomb field of the shells, described by fixed, pre-calculated and tabulated pseudo- 
potentials, added to that of the nuclei. Unfortunately, because of lack of time, for details I have to refer 
the reader to specialized literature. 33 

Let me, however, emphasize that despite the wide use of the DFT, 34 and its undisputable 
successes in describing some experimental data, it has its problems. For me personally, its largest 
conceptual deficiency is the incorporation of the absolutely unphysical Coulomb interaction of an 
electron with itself (by dropping condition j ' ^ j). As a result, existing DFT packages require substantial 
artificial tinkering to use them for description of such processes as single-electron transfer. 35 A little bit 
light-heartedly (but still correctly), one may say that an advanced DFT software package, run on a huge 
supercomputer, cannot be used to calculate the correct energy spectrum of a hydrogen atom - a century 
after this had been done by Niels Bohr on a slip of paper! 



32 For a uniform, degenerate Fermi-gas of electrons (with Fermi energy s f » k B T), the exchange potential may be 
readily calculated analytically, giving u ex = {3l4it)e 1 kfl\nS( S , where k F is the Fermi-surface wave number that 
defines both the Fermi energy s ¥ = (hk F ) 2 /2m and the electron density (per unit volume) n = 2{4nl!>)k F l(2rif = 
k F 3 /3^. 

33 See, e.g., G. te Velde et ah, J. Comp. Chem. 22, 931 (2001), and/or M. D. Segall et al, J. Phys. - Cond. Matt. 
14, 2717 (2002), and references therein. 

34 This popularity is enhanced by the availability of several advanced DFT software packages, some of them (such 
as SIESTA, http://icmab.cat/leem/siesta/ ) in public domain. 

35 See, e.g., N. Simonian et al, J. Appl. Phys. 113, 044504 (2013). 
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8.5. Quantum computation and cryptography 

Now I have to review the emerging fields of quantum computation and encryption? 6 These fields 
are currently the subject of intensive research effort, which has brought (besides much hype :-) a few 
results of genuine importance for quantum mechanics. My review, by necessity short, will emphasize 
these fundamental results, referring the reader interested in details to special literature. 37 Because of the 
active stage of these fields, I will also provide quite a few references to recent publications, making the 
style of this section closer to a brief research review than to a part of a textbook. 

Presently, the work on quantum computation and encryption is focused on systems of spatially- 
separated (and hence distinguishable) two-level systems - in this context, commonly called qubits. 38 
Due to this distinguishability, the issues that were the focus of the past few sections (including the 
benefits of the second quantization) are irrelevant here. On the other hand, systems of distinguishable 
qubits have some interesting properties that had not been yet discussed in this course. 

First of all, a system of TV » 1 qubits may contain much more information than the N classical 
bits - which is the maximum information capacity of N classical bistable systems. Indeed, according to 
the discussions in Chapter 4, an arbitrary pure state of a single qubit may be represented by its ket vector 
(4.37) -see also Eq. (5.1): 

\ a ) N=l =a 1 \u 1 } + a 2 \u 2 ) , (8.124) 

where {u} is any orthonormal two-state basis. In the quantum information theory, it is natural and 
common to employ, as Uj, the eigenstates aj of the observable A that is eventually measured in the 
particular physical implementation of the qubit - say, a certain spatial component of spin-'/i particle, etc. 
It is also common to write the kets of these base states as |0) and |1), so that Eq. (124) takes the form 39 

= a 0 \°) + a M) = H a i\j)> 

7=0,1 

where in the rest of this chapter, letter j will be used to denote an integer equal to either 0 or 1 . Hence 
any pure state a of a qubit is completely defined by two complex c-numbers a,, i.e. by 4 real numbers. 
Moreover, due to the normalization condition |ai| 2 + \a-^ = 1, we need just 3 independent real numbers - 
say, the Bloch sphere coordinates 0 and q> (see Fig. 5.1), plus the common phase y, which becomes 
important when we consider coherent states of several qubits - see Eq. (5.3). 



Single qubit 
(8.125) state's 

representation 



36 Since these fields are much related, they are often referred to together, under the (somewhat misleading) title of 
"quantum information". 

37 Despite many recent book titles in the field, one of its first surveys, by M. Nielsen and I. Chuang, Quantum 
Computation and Quantum Information, Cambridge U. Press, 2000, is perhaps still the best one. 

38 In some texts, the term qubit (or "Qbit", or "Q-bit") is used instead for the information contents of a two-level 
system - very much like the classical bit of information (in this context, frequently called "Cbit" or "C-bit") 
describes the information contents of a classical bistable system - see, e.g., SM Sec. 2.2. 

39 The slightly odd aspect of this notation is that at the Bloch sphere representation (Fig. 5.1), the North Pole state 
(that is traditionally denoted as T in other fields of quantum mechanics) is taken for 0, while the South Pole state 
4- for 1, so that Eqs. (5.4) take the form a 0 = cos(#/2), a\ = sin(d?/2)exp{/<^}. 
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Now, if we have a system of 2 qubits, its arbitrary pure state (4.37) may be represented as a sum 



of 2 =4 terms, 



40 



Two-qubit 
state's 
representation 



\ a ) N= 2 =a Oo| 00 ) + «Ol| 01 ) + a io| 10 > + a il| 11 )= Z 



\J2 / ' 



(8.126) 



with 4 complex coefficients, i.e. 4x2 = 8 real numbers, subject to just one normalization condition 41 



z 



hh 



= 1. 



(8.127) 



An evident generalization of Eqs. (125)-(126) to an arbitrary pure state of an TV-qubit system is 



given by a sum of 2 N terms 



z 



h,J z ,-J N =u, 



0,1 



J\Jt-Jn ' 



(8.128) 



N 



including all possible combinations of 0s and Is inside the ket, so that the state is fully described by 2 
complex numbers, i.e. 2-2 N = 2 N+l real numbers, with only one constraint, similar to Eq. (127), imposed 
by the normalization condition. Let me emphasize that this exponential growth of the information 
contents would not be possible without the qubit state entanglement. Indeed, in the particular case when 
qubit states are unentangled (separable), 



lor) =|or, 



a- 



(8.129) 



where each \a n ) is described by an equality similar to Eq. (125) with its individual expansion 
coefficients, the system state description requires only 3iVreal numbers - e.g., Assets {6, q>, y}. 

However, it is wrong (as it is sometimes done in popular reviews) to project this exponential 
growth of information contents directly on the capabilities of quantum computation, because this 
process has to include the output information readout, i.e. qubit state measurements. Due to the 
fundamental intrinsic uncertainty of quantum systems, the measurement of a single qubit even in a pure 
state (125) generally gives uncertain results, with probabilities Wo = \ao\ 2 and W\ = |fli| . In order to 
comply with the general notion of digital computation, a quantum computer has to provide certain (or 
virtually certain) results, and hence probabilities Wj have to be very close to either 0 or 1, so that before 
the measurement, each qubit has to be in a basis state - either 0 or 1 . This means that the computational 
system of TV qubits, just before the final readout, has to be one of the basis states 



a 



h Ji— J 



JlJl-J 



N / ' 



(8.130) 



which is a very small subset even of the set (129) of all unentangled states, and whose maximum 
information contents in just TV classical bits. 



40 Here and in most instances below I use the same shorthand notation as was used in the beginning of this chapter 
- cf. Eq. (8.1). In this short form, qubit's number is coded by the order of its state index inside the single ket- 
vector, while in the long form, such as in Eq. (129), it is coded by the order of the ket-vector. 

41 It follows from the requirement that the sum of two probabilities Wj = ^or [.P^- 1 or^ (where P. = L/'/w is the 

corresponding projection operator, see Sec. 4.5) to find one of qubits in one of its two possible states j, equals 1. It 
is remarkable that the application of this condition to any of the qubits results in the same Eq. (127). 
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Now the reader may start thinking that this constraint strips quantum computations of any 
advantages over their classical counterparts, but this view is also superficial. In order to show that, let us 
consider the scheme of the most frequently explored type of quantum computation, shown in Fig. 2. 42 
Each horizontal line (sometimes called a "wire" 43 ) corresponds to a single qubit, tracing its time 
evolution in the same direction as at the usual time function plots: from left to right. This means that the 
left column \a)- m of ket-vectors describes the initial state of qubits, 44 while the right column |a) ou t 
describes their final (pre-detector) state. The box labeled U represents the qubit evolution in time due to 
their specially arranged interactions between each other and/or external drive "forces". Besides these 
forces, during this evolution the system is supposed to be isolated from the dephasing and energy- 
dissipating environment, so that it may be described by a unitary operator defined in the l^-dimensional 
Hilbert space of N qubits: 



la) =U\a). 

I / out I / u 



(8.131) 



With the condition that the input and output states have the simple form (130), this equality reads 

1 0i L 0' 2 L • • -On L) = u 1 0\ L 0' 2 L • • (j N L ) • (8.132) 
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unitary 
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number 



OUty 
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Fig. 8.2. The baseline scheme of quantum computation. 



qubit state 
out measurement 



42 Numerous modifications of this baseline scheme have been suggested, for example with the number of output 
qubits different from that of input qubits, etc. Some other options are discussed in the end of this section. 

43 The notion of "wires" stems from the fact that similar diagrams are used to describe classical computation 
circuits as well (see, e.g., Fig. 3a below), and in that case the lines may be indeed understood as physical wires 
connecting physical devices: logic gates and/or memory cells. Note that classical computer components also have 
nonvanishing time delays, so that even in this case the left-to-write device ordering is useful to indicate the timing 
of (and frequently the causal relation between) the signals. 

44 As we know from Chapter 7, the preparation of pure state (125) is (conceptually :-) straightforward. Placing a 
qubit into a weak contact with an environment of temperature T « A/k B , where A is the difference between 
energies of eigenstates |0) and |1>, we may achieve its relaxation into the lowest-energy state. (Otherwise, the 
relaxation may be to one of states with equal, or nearly-equal energies, combined with its measurement - see Fig. 
7.8 and its discussion.) Then, if the qubit must be set into the opposite state, it may be driven there by the 
application of a pulse of a proper external classical "force". For example, if actual spin- l A particles are used as 
qubits, a constant magnetic field may be applied in the [x, y] plane for a half-period of the torque-induced spin 
precession - see Fig. 5.1c. However, for most qubit implementations, the basis state reversal using a half-period 
of rf-induced Rabi oscillations (Sec. 6.5) is more convenient. 
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The art of quantum computer design is selecting such unitary operators U that would: 

- satisfy Eq. (132), 

- be physically implementable, 

- enable substantial performance advantages of the quantum computation over its classical 
counterpart of similar functionality, at least for some digital functions (algorithms). 

I will have time to demonstrate the possibility of such advantages on just one, perhaps the 
simplest example - the so-called Deutsch problem. 45 Let us consider the family of single-bit classical 
Boolean functions y' ou t =fljm)- Since both j are Boolean variables, i.e. may take only values 0 and 1, there 
are evidently only 4 such functions: 
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(8.133) 



Of them, functions f\ and fa, whose values are independent of their arguments, are called constants, 
while functions f 2 (called "YES" or "IDENTITY") and f 3 ("NOT" or "INVERSION") are called 
balanced. The Deutsch problem is to determine the class of a single-bit function, implemented as a 
"black box", as being either constant or balanced, using just one experiment. 

Classically, this is clearly impossible, and the simplest way to perform the function classification 
involves two similar black boxes /- see Fig. 3a. 



(a) 



0 — 



1 — 




>>-|i» 



+i 
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(b) 



n - \F 



n -±|i> 



Fig. 8.3. The simplest (a) classical and (b) quantum ways to classify a single-bit Boolean function / 



This solution uses the so-called exclusive-OR (for short, XOR) gate whose output is described by 
the following function F of its two Boolean arguments j\ and j 2 : 



F(JlJ 2 ) = Jl®J2 = < 

In the circuit shown in Fig. 3a, the gate produces output 



0, if J\ =j 2 > 

1, if h * h- 



(8.134) 



45 Named after D. Deutsch, whose 1985 paper (motivated by an inspirational but not very specific publication by 
R. Feynman in 1982) launched the whole field of quantum computation. 
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F = /(O)0/(1), (8.135) 

equal to 1 if/(0) ^ftl), i-e. if function /is balanced, and 0 in the opposite case - see the 4 th column in 
Eq. (133). 46 

On the other hand, let us assume that all four functions / may be implemented quantum- 
mechanically, for example as a unitary transform acting on two qubits (Fig. 4a), and acting as follows 
each of basis components = J/i>|/2> of the general input state (126): 

f\W2) = \J 1 )\J2®fU i )), (8-136) 
where / is any of the classical Boolean functions defined by Eq. (133). 



(a) 



j 2 ®f(j\) 



(b) 



C 



J 2 ®J\ 



Fig. 8.4. Fwo-qubit quantum gates: (a) 
two-qubit function / and (b) its particular 
case C (CNOF), and their actions on the 
basis states. 



In the particular case when / is the YES function: flj) = f 2 (j) = j, gate / is reduced to the so- 
called CNOT gate - a key ingredient of other quantum computation schemes, performing transform 



C \jj2) = \j\)\j2®j l 



(8.137a) cnot 

function 



Let us spell out this rule for all four possible input qubit combinations: 

C|00) = |00), C|0l) = |0l), CllO) = 111), C|11) = |10). (8.137b) 



In plain English, this means that acting on basis states J/1/2}, the CNOT gate leaves the state of first, 
source qubit (shown by the upper lines in Fig. 4) intact, but flips the state of the second, target qubit if 
the first one is in the basis state |1). In even simpler words, the state j\ of the source qubit controls the 
NOT function acting on the target qubit - hence the gate's name CNOT (the semi-acronym of 
"Controlled NOT"). 

For the quantum function (136), the Deutsch problem may be solved within the general scheme 
shown in Fig. 2, with the particular structure of the unitary-transform box U spelled out in Fig. 3b, 
which involves just one implementation of the function. Here the singe-qubit quantum gate /¥ 

symbolizes the so-called Hadamard (or "Walsh-Hadamard") transform* 1 whose linear operator is 
defined by the following actions on qubit's basis states: 



46 Alternatively, we may perform two sequential experiments on the same black box f, first recording and then 
recalling their results. 

47 In order to exclude any chance of confusion between the Hadamard transform's operator W and the 
Hamiltonian operator H , I have typeset them using different fonts. 
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Hadamard 
transform 




(8.138) 



- see also the 4 left state labels in Fig. 3b. 48 On the Bloch sphere (Fig. 5.1), and in the usual spin-!^ 
notation, Eqs. (137) correspond to the transfer of the representing point from the North Pole's state T, 
i.e. one of the eigenstates of matrix <j z , to one of equatorial states, — », i.e. one of the eigenstates of 
matrix a r , and from the South Pole state X to the another equatorial state, <— , see Eq. (4.122). However, 
a ^/2-rotation in the [x, z] plane would be a poor interpretation of this function. Indeed, since its operator 
has to be linear (to be physically realistic), it needs to perform action (138) on the basis states even 
when they are parts of an arbitrary linear superposition - as they are, e.g., for the two right Hadamard 
gates in Fig. 3b. For example, as immediately follows from Eq. (137) and operator's linearity, 



#(#|0) 



>Mf^°Hi)) 



V 



^o )+ *ii))=^(|o> + i.» + ^(|o>-ii»; 



Absolutely similarly, we may get 49 



#(#|1)) = 



0), (8.139a) 



(8.139b) 



Due to this reason, a better interpretation of the Hadamard transform is a ^-rotation about the axis that 
bisects the angle between axes x and z. 

Now let us carry out an analysis of the "circuit" shown in Fig. 3b, minding all the time the 
operator linearity, and the fact that the transformation rules (136)-(138) are only applicable to basis kets 
of the initial ("input") state vector. In particular, taking into account that according to Fig. 3b, the input 
states of gate / in this particular circuit are described by Eqs. (138), its output state's ket is 



/|01) + /|10)-/|11 



x(#| o>*| 1))= J^f^fl o> + 1 (I o> - 1 1»J = i (/| 00 

Now we may apply Eq. (136) to each of the basis kets to get: 

f\ 00) - /|01) + /|10)-/|11) S /| 0>| 0) - f\ 0)1 1} + f\ I)] 0) - f\ I)] 1) 

= 1 0)1 o e /(0)) - 1 o)| i e /(o)> + 1 1)| o e /(i)> - 1 1)| 1 e /(i)) 
= | o)(j o e /(0)) - 1 i e /(0)» + 1 i)(j o e /(!)) -|ie f(\))). 



(8.140) 



(8.141) 



Note that the expression in the first parentheses, characterizing the state of the target qubit, is equal to 
(|0> - 11» = (-1)° (|0> - 11» if/(0) = 0 (and hence 08/(0) = 0 and 10/(O) = 1), and to (|1) - 10» = (-l^flO) - 
|1)) in the opposite case/(0) = 1, so that both cases may be described in one shot by rewriting the 
parentheses as (-1)^ 0) (|0) - |1». The second parentheses is absolutely similarly controlled by the value of 
/(l), so that the state of the system at the output of gate / is unentangled again: 



48 Note that according to Eq. (138), operator W does not belong to the limited class U described by Eq. (132). 

49 Since states 0 and 1 form a full basis of the single qubit, Eqs. (139) may be summarized as an operator 
equality: W 2 = I . 
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^(^| 0>^| 1>) = ^ ((-1)^< 0) 1 0) + (-1)^^^ 1 1>>] 0> - 1 1» = ± (J 0> + (-1)- 1 1>)^ ^ 0> - 1 1», (8.142) 

where the last transition has used the fact that the Boolean function F, defined by Eq. (135), equals to 
- ft®)] ~ compare the last two columns in Eq. (133). Since the common sign (i.e. the common 
phase shift by 7t) is inconsequential, it may be prescribed to any of the component ket-vectors - for 
example to that of the target qubit, as shown by the third pair of state labels in Fig. 3b. 

This intermediate result is already rather remarkable. Indeed, it shows that, despite the 
impression one could get from Fig. 4, gates / and even C, being "controlled" by the source qubit, may 
change that qubit's state as well! This fact (partly reflected by the vertical direction of the control lines 
in Figs. 3, 4, symbolizing the same stage of system's evolution in time) shows how careful one should 
be interpreting quantum-computational "circuits". 

At the second stage of the circuit shown in Fig. 3b, the qubit components of state (142) are fed 
into one more pair of Hadamard gates, whose outputs therefore are 

#-^(|0> + (-l)'|l>)=-^^|0> + (-l)'#|l>) and #^(|0>-|l>jj = ±-L(#|l>-#|0>). (8.143) 

Now using Eqs. (138) again, we see that the output state ket-vectors of the source and target qubits are, 
respectively, 

1 + |0)+ 1 " ( " 1)f and±|l). (8.144) 



Since, according to Eq. (135), the Boolean function F may take only values 0 or 1, the final state of the 
source qubit is always one of its basis states j, namely the one with j = F. Its measurement (see Fig. 2) 
immediately tells us whether function /, participating in Eq. (136), is constant or balanced. 50 

Thus, the quantum circuit shown in Fig. 3b indeed solves the Deutsch problem in one shot. 
Reviewing our analysis, we may see that this is possible because the unitary transform performed by 
gate /is applied to quantum superpositions (138) rather than to the basis states. Due to this trick, the 
quantum state components depending on/O) and/1) are processed simultaneously, in parallel. This 
quantum parallelism may be extended to circuits with many (N » 1) qubits and, for some tasks, 
provide a dramatic performance increase - for example, reducing the necessary circuit component 
number from 0(exp{N}) to OQf), where p is a finite (and not very big) number. 

However, this efficiency comes at a high price. Indeed, let us discuss the physical 
implementation of quantum gates, starting from the Hadamard gate, which performs a single-qubit 
transform - see Eq. (138). With the linearity requirement, its action on the arbitrary state (125) should be 



W\a) = a 0 M0) + a i Ml) = 



«oijo) + |l))+fl 1 ^jo>-|l>)=-^(a 0 +aJo) + -^(a 0 -aJl>, (8.145) 

meaning that the state expansion coefficients in the end (t = T) and beginning (t = 0) of the qubit 
evolution in time have to be related as 



50 This means that the last Hadamard transform of the target qubit (i.e. the Hadamard gate shown in the lower 
right corner of Fig. 3b) is not necessary for the Deutsch problem solution - though it should be included if we 
want the whole circuit to satisfy the general condition (132). 
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a 0 (T) = 



a 0 (0) + <*!«)) 
4~2 



a l (T) = 



a 0 (0)^(0) 
4~2 ' 



(8.146) 



This task may be again performed using the Rabi oscillations, which were discussed in Sec. 6.5, 
i.e. by applying to the qubit (a two-level system), for a limited time period T, a weak sinusoidal external 
signal of frequency co equal to the intrinsic quantum oscillation frequency co nn - defined by Eq. (6.85). A 
perturbative analysis of the Rabi oscillations was carried out in Sec. 6.5, even for nonvanishing (though 
small) detuning A = co - oo nn , but only for the particular initial conditions when at t = 0 the system was in 
one on the basis states (there labeled as n'), i.e. another state (there labeled n) was empty. For our 
current purposes we need to find coefficients flo,i(0 of expansion (125) for arbitrary initial conditions 

2 2 

«o,i(0), subject only to the time-independent normalization condition \ao\ + \a\\ = 1. For the case of 
exact tuning, A = 0, the solution of Eqs. (6.94) is elementary, and gives, instead of Eq. (6.102), 51 the 
following solutions: 



a 0 (t) = a 0 (0)cosQ? -ia l (0)e l ^ > sinQ.t, 
a l (t) = a l (0) cos Clt - ia Q (Q>)e~ l<p sin Q,t, 



(8.147) 



where Q is the Rabi oscillation frequency (6.101), in the exact-tuning case proportional to amplitude \A\ 
of the external rf drive A = \A\exp{i(p}, while cp is the phase of the driving signal - see Eqs. (6.86)- 
(6.87). Comparing these expressions with Eqs. (146), we see that for t = T = nlAO. and cp = nil they 
"almost" coincide, besides the opposite sign of a\{T). 

Conceptually the simplest way to correct this deficiency is to follow the rf "^/4-pulse", just 
discussed, by a short dc "^-pulse" of duration T' = nld, which temporary creates an small additional 
energy difference 8 between basis states 0 and 1. According to the basic Eq. (1.61), such difference 
creates an additional phase difference T'SIti between the states, equal to ;rfor the "^-pulse". 

Another way (that may be also useful for two-qubit operations) is to use another, auxiliary 
energy level E 2 whose distances from the basic levels E\ and E 0 are significantly different from the 
difference (E\ - E 0 ) - see Fig. 5a. 



E n 



20 



heo. 



21 



heo,. 



(a) 



|1> 



(b) 

|H) 

|01),|10 



1 00) 



A, + A 2 












A: 


A 2 







1 1 1 

|io> 
|oi> 

loo 



(c) 



Fig. 8.5. Energy-level schemes used for unitary transformations of (a) single qubits and (b, c) two-qubit systems. 



51 To comply with our current notation, coefficients a,,' and a„ of Sec. 6.5 should be replaced with a 0 and a\. Also 
note that their definition (6.82) implies that the trivial time evolution (6.81) of unperturbed qubits has been 
already excluded from these expansion coefficients. 
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In this case, the weak external rf field tuned to any of 3 potential quantum transition frequencies 
(o nn - = (E n - E n )l h initiates such transitions between the corresponding states only, with a negligible 
perturbation of the state not involved in this transition. Such transitions may be again described by Eqs. 
(147), with the appropriate index changes. For the Hadamard transform implementation, it is sufficient 
to apply (after the already discussed ^/4-pulse of frequency cow, and with the initially empty level Ej), 
an additional ;r-pulse of frequency <«2o, with any phase (p. Indeed, according to the first of Eqs. (147), 
with the due replacement ai(0) — > a2(0) = 0, such pulse flips the sign of coefficient ao(t), while 
coefficient a\(t), not involved in this additional transition, remains unchanged. 

Now let me describe the conceptually simplest (though, for some qubit types, not practically 
most convenient) scheme for the implementation of the CNOT gate, whose action is described by a 
linear unitary operator satisfying Eq. (137). For that, evidently, qubits have to let interact for some time 
T. As was repeatedly discussed in two past chapters, in most cases such interaction of two subsystems is 
bilinear - see, e.g., Eq. (6.148). For qubits, i.e. two-level systems, each of the component operators may 
be represented by a 2x2 matrix in the basis of states 0 and 1. According to Eq. (4.105), such matrix may 
be expressed as a linear combination (col + c-a), where Co and three Cartesian components of vector c 
are c-numbers. Let us take such bilinear interaction Hamiltonian in the simplest form 

"M K "T' fo T' <T - ^ 

! 0, otherwise, 

where the upper index is the qubit number, and k is a c-number constant. 52 According to Eq. (4.175), 
by the end of the interaction period, this Hamiltonian produces the following unitary transform: 

U mt = /expj-^rj = /expj-^a^rj. (8.149) 

Since in the basis of unperturbed two-bit states I/V2) the product operator a^a^ is diagonal, so is the 
unitary operator (149), with the following action on the basis states: 

U mt \jJ 2 ) = exp{*0<7>f }\jj 2 ), (8.150) 

where 9 = -/cT/h, and a z are the eigenvalues of the Pauli matrix a z for the basis states of the 
corresponding qubit: <j z = +1 for J/> = |0), and a z = -1 for J/> = |1). Let me, for clarity, spell out Eq. (150) 
for the particular case 0 = -7r/4 (corresponding to the qubit coupling time T= nfilAx): 

t> int |00) = e-' W4 |00), t/ mt |0l) = e< W4 |0l), Z/JlO) = ^ /4 |l0>, tfjl l) = e~ i7rlA \\ l) . (8.151) 



52 The assumption of simultaneous time independence of the basis state vectors and the interaction operator 
(within the time interval 0 < t < T) is possible only if the basis state energy difference A of both qubits is exactly 
the same. For this case, the simple physical explanation of the time evolution (149) follows from Fig. 8.5, which 
shows the spectrum of the total energy E = E\ + E 2 of the two-bit system. In the absence of interaction, the 
energies of two basis states, |01) and |10), are equal, enabling even a weak qubit interaction to cause their 
substantial evolution in time - see Sec. 6.7. If the qubit energies are different (Fig. 5c), the interaction 
may still be reduced, in the rotating- wave approximation, to Eq. (149), by compensating the energy 
difference (Ai - A 2 ) with an external rf signal of frequency a>= (Ai - A 2 )/fr - see Sec. 6.5. 
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In order to compensate the undesirable parts of this joint phase shift of the basis states, let us 
apply (either before or after it) similar individual "rotations" of each qubit by angle 6' = +7i/4, using the 
following product of two independent operators, plus (just for the result clarity) a common, and hence 
inconsequential, phase shift 6" = -nIA'P 




(8.152) 



Since this operator is also diagonal in the ]/i/2> basis, it is equally easy to calculate the change of the 
basis states by the total unitary operator U t = U com U int : 

C/,|00) = |00), t/,|0l) = |0l), t/,|lO) = |lO), t/,|ll) = -|ll). (8.153) 

This result already shows the main "miracle action" of two-qubit gates, such as shown in Fig. 4: 
the source qubit is left intact (only if it is in a basis state!), while the state of the target qubit is altered. 
True, this is still different from the CNOT operator's action (137), but may be readily reduced to it by its 
sandwiching of transform U t between two Hadamard transforms applied to the target qubit: 

C = ^W [2) U ,ft (2) . (8.154) 

We have spend quite a bit of time on the discussion of the CNOT gate, 54 and now I can reward 
the reader for his/her effort with a bit of good news: it has been proved that an arbitrary unitary 
transform that satisfies Eq. (132), i.e. may be used within the general scheme outlined in Fig. 2, may be 
decomposed into a set of CNOT gates mixed with simpler single-qubit gates - for example, the 
Hadamard gate plus the nil rotation discussed above. 55 Unfortunately, I have no time for a detailed 
discussion of more complex circuits. 56 Perhaps the most famous of them is the scheme for integer 
number factoring, suggested in 1994 by P. Shor. 57 Due to its potential practical importance for breaking 
broadly used communication encryption schemes such as the RSA code, 58 this opportunity has incited a 
huge wave of enthusiasm, and triggered experimental efforts to implement quantum gates and circuits 



53 It Eq. (4.175) shows, each of component unitary transforms I Qxp{i0'a z } may be created by applying to each 

qubit, for a time period T' = TiO'/k', a constant external field described by Hamiltonian H = —K?(J Z ■ We already 
know that for a charged, spin-!/2 particle, such Hamiltonian may be created by applying z-oriented external 
constant magnetic field - see Eq. (4.163). For most other physical implementations of qubits, the organization of 
such Hamiltonian is also straightforward - see, e.g., Fig. 7.4 and its discussion. 

54 As was discussed above, this gate is identical to quantum gate / for f=fi, i.e.fj) =j. The implementation of / 
for 3 other functions / requires straightforward modifications whose analysis is left for reader's exercise. 

55 This fundamental importance of the CNOT gate was perhaps a major reason why D. Wineland, the leader of the 
NIST group that had demonstrated the first experimental implementation in 1995 (following the theoretical 
suggestion by J. Cirac and P. Zoller), was awarded the 2012 Nobel Prize (shared with S. Haroche, the leader of 
another leading group working towards quantum computation). 

56 For that, the reader may be referred to either the monograph by Nielsen and Chuang, cited above, or to a shorter 
(but more formal) textbook by N. D. Mermin, Quantum Computer Science, Cambridge U. Press, 2007. 

57 His original paper was published only in proceedings of a meeting, but a clear description of the algorithm may 
be found in several accessible sources including Wikipedia ( http :// en. wikipedia. org/wiki/Shor' s algorithm) . 

58 Named after R. Rivest, A. Shamir, and L. Adleman, the authors of the first open publication of the code in 
1977, actually invented earlier (in 1973) by C. Cocks. 
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using a broad variety of two-level quantum systems. Presently, the following options are most eagerly 
pursued: 59 

(i) Trapped ions . The first experimental demonstrations of quantum state manipulation 
(including the already mentioned first CNOT gate) have been carried out using deeply cooled atoms in 
optical traps, similar to those used in frequency and time standards. Their electron spins are natural 
qubits, whose states may be manipulated using the Rabi transfers excited by suitably tuned lasers. The 
spin interactions with environment may be very weak, resulting in large dephasing times (T 2 , see Sec. 
7.3), up to a few seconds. Since the distances between atoms in the traps are relatively large (of the 
order of a micron), their direct spin-spin interaction is even weaker, but atoms may be made effectively 
interacting either via their mechanical oscillations about the potential minima of the trapping field, or 
via photons in electromagnetic resonators ("cavities"). 60 Perhaps the main challenge of using this 
approach for quantum computation is poor "scalability", i.e. the enormous challenge of creating large, 
ordered systems of individually addressable qubits. 

(ii) Nuclear spins are also typically very weakly connected to environment, with T2 exceeding 10 
seconds in some cases. Their eigenenergies Eo and E\ may be split by external dc magnetic fields 
(typically, of the order of 10 T), while the interstate Rabi transfers may be readily achieved by 
application of external rf fields with frequencies a = (E\ - Eo)/h of a few hundred MHz. 61 The 
challenges of this option include the weakness of spin-spin interactions (typically mediated through 
molecular electrons), resulting in a very slow spin evolution, whose time scale h/rc may become 
comparable with T2, and small level separations E\ - Eo, corresponding to a few K, 62 i.e. much smaller 
than the room temperature, creating a problem with qubit state preparation. 63 

Despite these challenges, the nuclear spin option was used for the first implementation of the 
Shor algorithm for factoring of a small number (15 = 5x3) as early as in 2001. 64 However, the extension 
of this success to larger systems, beyond the set of spins inside one molecule, is problematic. 

(iii) Josephson-junction devices . Much better scalability may be achieved with solid state 
devices, especially in superconductor integrated circuits including weak contacts - Josephson junctions. 
As was already discussed in Sec. 2.8, if the coupling of a Josephson junction to its dissipative 
environment is sufficiently weak (in particular if its effective parallel resistance is much higher than the 
quantum resistance unit Rq ~ 10 4 Q), the Josephson phase variable q> behaves as a coordinate of a ID 
quantum particle with effective mass (2.252), moving in a 2;r-periodic potential - see Eq. (2.250). This 
fact creates several opportunities for qubit implementation using quantum behavior of this macroscopic 
degree of freedom. 



59 For more details, and a discussion of other possible implementations (such as quantum dots and dopants in 
crystals) see, e.g., T. Ladd et al., Nature 464, 45 (2010), and references therein. 

60 A brief discussion of such interactions (so-called Cavity QED) will be given in Sec. 9.4 below. 

61 In this field, the condition a> = a>\o, discussed above, is called the nuclear magnetic resonance, or 
NMR - the term well known due to the broad application of this effect in chemistry and medicine. 

62 See Eq. (4.5) and its discussion. 

63 This challenge may be partly mitigated using ingenious spin manipulation techniques such as refocusing - see, 
e.g., either Sec. 7.7 in Nielsen and Chuang, or J. Keeler's monograph cited in the end of Sec. 6.5. 

64 B. Lanyon et al, Phys. Rev. Lett. 99, 250505 (2001). 
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In an insulated junction, 65 the phase motion in the periodic potential U{(p) = -Ejcosq) creates the 
energy band structure E(q) that was discussed in detain in Sec. 2.7. In particular, in the weak potential 
limit (which, for the Josephson junction case, is valid at Ej « e 2 l2C - see the discussion in Sec. 2.8), 
the lowest bandgaps are very narrow, and function E(q) in their vicinity is well described by the usual 
level anticrossing - see Figs. 2.28 and 2.29 and their discussion. The translation of this fact to the 
Josephson junction language (see, in particular, Eq. (2.256) and its discussion) shows that the values of 
the effective electric charge Q of the junction, on two anticrossing energy branches, differ by charge 2e 
of one Cooper pair. Since, according to Eq. (2.222) and its discussion, the system dynamics in this case 
is reduced to the interaction of these two states with different Q, in application to quantum computation 
this system is called the charge qubit. Unfortunately, the states of such qubit are rather sensitive to 
random charged impurities injunction's vicinity, causing strong fluctuations, and hindering its control, 
so this option is not actively pursued nowadays. 

Other options are based on the modification of potential U(<p) at Josephson junction 
incorporation into superconducting loops, i.e. in SQUIDs. 66 In the simplest case of a single loop of 
inductance L closed by one junction with critical current I& the total potential energy of the system in an 
external magnetic field is 67 



U(<p) = E J 



2P L 



with Ej =^S fi L =^hL, (8.155) 



where (p ext is proportional to the external magnetic flux O ex t through the loop. According to this relation, 



at Ej» e I2C (corresponding to the tight-binding limit of the energy band theory), one convenient way 
to implement a two-level system is to take the dimensionless inductance parameter /3l above but very 
close to 1 (0 < Pi - 1 « 1), the "symmetrizing" magnetic field (<p ext « n), and Ej » {e IC)l{p L - l) 3 . In 
this case, the potential profile has the shape of a nearly symmetrical double well, with ground states in 
each well coupled by tunneling through a relatively low tunnel barrier, creating a pair of eigenstates 
with relatively low eigenenergy splitting A = E\ — Eq « Ej (Fig. 6a). 



E. * 





Fig. 8.6. Typical potential 
profiles and energy levels of 
SQUID-based qubits: (a) "flux 
qubit" and (b) "phase qubit". 
Red dashed lines show 
eigenenergies of the used 
states 0 and 1 . 



Such flux qubits have a relatively large magnitude |®io| = |®oi| of the matrix elements of the 
operator of magnetic flux O = (fille)(p piercing the SQUID loop. This certainly makes the arrangement 



65 For the purposes of Ej control reasons, it is more convenient to use two-junction configurations called Block 
transistors. Unfortunately, I do not have time to go into these details. 

66 See, e.g., EM Sec. 6.4 and references therein. 

67 This expression directly follows from combining EM Eqs. (6.57), (6.59), and (6.70). 
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of necessary coupling between flux qubits (see, e.g., Eq. (149) and its discussion) very easy, despite the 
macroscopic (-10 urn) sizes of SQUIDs and hence of the distances between them, decreasing the time T 
~ TiIk necessary for the most critical two-bit (e.g., CNOT) operations, to a just few nanoseconds. 
However, the large flux matrix elements also increase the undesirable coupling of such qubits to 
dephasing environment, and hence decrease dephasing time T 2 - typically, to just a few tens or hundreds 
nanoseconds, uncomfortably close to T. 

This coupling may be decreased, leading to a substantial increase of T 2 (up to a few 
microseconds) by moving the bias phase <p ext away from the symmetrizing value n, i.e. using the 
asymmetric potential profile sketched in Fig. 6b. The working states 0 and 1 of such phase qubit, 
localized in a higher potential well (shown left in Fig. 6b), are actually metastable, but with a very long 
lifetime because of the relatively high barrier separating the wells. An additional benefit of this 
arrangement is that a fast lowering of the tunnel barrier causes the system in state 1 to tunnel into the 
lower well, with the sequential energy relaxation (see the arrows in Fig. 6b); this process be used for 
qubit state readout. A major problem of phase qubits is that the part of potential U((p), in which qubit 
states are localized, is almost quadratic, so that the energy levels are nearly equidistant - cf. Eqs. 
(2.114), (6.15), and (6.22). 68 As a result, the external rf drive of frequency a>= (E\ - E 0 )/h, used to 
arrange the state transforms described by Eq. (146), may induce simultaneous undesirable transitions to 
(and between) higher energy levels. This effect may be mitigated by the rf drive amplitude reduction 
(see Problem 6.6), but at a price of the proportional increase of transfer time T, that may again become 
comparable to T 2 . Despite this problem, phase qubits have been used for a successful experimental 
demonstration of the core single-operand and two-operand gates, and recently, for the reproduction of 
number 15 factoring "48% of the time". 69 

(iv) Optical systems pose a special challenge for quantum computation: due to the virtual 
linearity of most electromagnetic media at reasonable light power, the implementation of interaction 
Hamiltonians, such as (149), is problematic. However, in 2001 a very smart way around this hurdle was 
invented. 70 In this KLM scheme, nonlinear elements are not needed, and quantum gates may be 
composed just of linear devices (such as optical waveguides, mirrors and beam splitters), plus single- 
photon sources and detectors. Unfortunately, a quantitative discussion of this scheme would require 
using the basics of quantum electrodynamics that will be discussed only in the next chapter. The work in 
this direction has already led to an experimental demonstration of factoring number 21 =3x7 (which in 
some aspects is easier than that of 15). 71 

Let me, however, note that due to the statistical nature of Shore's algorithm, and the so-far 
imperfect fidelity of qubit manipulations, all number factoring experiments carried out so far may be 
more fairly described merely as demonstrations of their result consistency with the (evident) 
mathematical facts. So, despite a very substantial research effort, the progress is rather slow, with the 
main culprit being the unintentional coupling of qubits to environment, leading most importantly to their 
state dephasing, and eventually to errors. (Another major problem of this research field is the lack of 
algorithms (besides Shor's number factoring) that would give quantum computation a substantial 



68 This is even more true for the so-called "transmons" (or "Xmons") - the phase qubits versions in which a 
Josephson junction is just a part of an external resonator, providing it with small nonlineartity (anharmonism) - 
see, e.g., R. Barrens et al., Nature 508, 500 (2014) and references therein. 

69 E. Lucero et a!., Nature Physics 8, 719 (2012). 

70 E. Knill et al, Nature 409, 46 (2001). 

71 E. Martin-Lopez et al., Nature Photonics 6, 773 (2012). 
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advantage over classical counterparts, and hence a potential customer base broader that the 
communication encryption community, that could provide the necessary significant support.) 

Of course, some error probability exists in classical digital logic gates and memory cells as well. 
However, in this case, there is no conceptual problem with the device state measurement, so that the 
error may be detected and corrected in many ways; perhaps the simplest one is the so-called majority 
voting. For that, the input bit is reproduced in several (say, three) copies and sent to three similar 
devices whose outputs are measured and compared. If the output bits differ, at least one of the devices 
has made at error. The error may be not only detected, but also corrected by taking the two coinciding 
output bits for the correct one. If the probability of a single device error is W « 1, the probability of 
error of any device pair is close to W 2 , and that of some pair (and hence of the whole majority voting 
scheme) is close to 3W 2 . Since for the currently dominating CMOS integrated circuits, Wis very small, 
such error correction circuit creates a dramatic fidelity improvement - at the cost of higher circuit 
complexity (which may be traded for larger time delay) and consumed power. 

For quantum computation, the general idea of using several devices (say, qubits) for coding the 
same information remains the same; however, there are two major complications, both due to the analog 
nature of qubit states. First, as we know from Chapter 7, the dephasing effect of environment may be 
described as a slow random drift of coefficients a, in expansion (128), leading to the deviation of the 
output state ar m from the basis form (132), and hence to a nonvanishing probability of wrong qubit state 
readout (Fig. 2). Hence the quantum error correction has to protect the result not only against possible 
random state flips 0 <-> 1 as in the classical digital computer, but also against these "creeping" analog 
errors. 

Second, the qubit state is impossible to copy exactly (clone) without disturbing it, as follows 
from the following simple calculation. 72 Cloning state a of one qubit to another qubit, initially in an 
independent state (say the basis state 0), means the following transformation of the two-qubit ket: \aO) 
— > \aa). If we want such transform to be performed by a real quantum system whose evolution is 
described by a unitary operator u , and to be correct for an arbitrary state a, it has to work not only for 
both basis states of the qubit: 

u|00) = |00), m|10) = |11), (8.156) 

and also for their arbitrary linear combination (125). Since operator u has to be linear, we may use Eq. 
(156) to calculate 

fi|aO) = «(a 0 |0) + a 1 |l)]|0) = fl 0 «|00) + fl 1 M|lO) = fl 0 |00) + fl 1 |ll). (8.157) 
On the other hand, the desired result of cloning is 

| aa) = (a 0 \0) + a 1 \ l)\a 0 10) + a x | l)) = a\ | 00) + a a a x (jlO) + |0l))+ a\ |l l) , (8.158) 

i.e. evidently different, so that, for an arbitrary a, 

No-cloning 
theorem 



u\a0) * \ aa 



(8.159) 



72 Amazingly, this no-cloning theorem was discovered as late as in 1982 (independently by W. Wooters and W. 
Zurek, and by D. Dieks) - in the context of work toward quantum cryptography. 
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showing that the qubit state cloning is indeed impossible. 73 

This problem may be circumvented in the way shown in Fig. 7a. Here the CNOT gate, whose 
action is described by Eq. (137), entangles an arbitrary input state (125) of the source qubit with a basis 
initial state of an ancillary qubit - frequently called ancilla. Using Eq. (137), we may readily calculate 
the output two-qubit state's vector: 



\a) N _ 2 =c(a 0 |0) + a 1 |l))o) = a 0 C|Ol) + a 1 C|lO) = a 0 |00) + a 1 |ll 



(8.160) Sr 



ing 



We see that this circuit does perform operation (157), i.e. re-prescribes the initial source qubit's 
expansion coefficients ao and a\ equally to two qubits, i.e. duplicates the input information, though in 
contrast with the "genuine" cloning, it changes the state of the source qubit. Such "quasi-cloning" is the 
key to virtually all quantum error correction techniques. 



a 0 \0) + a l 1 



0 



(a) 



a 0 Oy + ajl 
1° 

lo 



e- 



a 0 |00 
+ a 1 |ll 



\A) \B) \C) \D) \E) \F 

Fig. 8.7. (a) Quasi-cloning, and (b) detection and correction of dephasing errors in a single qubit. 



<P 



(b) 



Consider, for example, the three-qubit circuit shown in Fig. 7b. At its input, the double 
application of the quasi-cloning produces an intermediate state A with the ket-vector 

|^> = a 0 |000) + fl 1 |lll), (8.161) 

which is an evident generalization of Eq. (160). Subjecting the source qubit to the Hadamard transform 
(138), we get three-qubit state B represented by vector 

\ B ) = a 0^0) + \l))00) + a l ^=H0)- 1 1)) 11). (8.1 62) 

Now let us assume that at this stage, the source qubit comes into a contact with a dephasing 
environment (in Fig. 7, symbolized by single-qubit "gate" (p) . As we know from Sec. 7.3, its effect 
(besides some inconsequential shift of common phase) may be described by a random mutual phase shift 
of the basis states: 74 



73 This does not mean that several qubits cannot be put into the same, arbitrary quantum state - theoretically, with 
arbitrary precision. Indeed, they may be first set into their lowest-energy stationary states as was discussed above, 
and then driven into an arbitrary state (125) by exerting on them similar classical external "forces". So, the no- 
cloning theorem pertains to only an unknown state a of a qubit. 

74 For example, in the Hilbert space of the qubit, the model Hamiltonian (7.70), which was explored in Sec. 7.3, is 
diagonal in the z-basis of states 0 and 1 , so that the unitary transform it provides during interval T is also diagonal, 
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|0)-> e z >|0), (8.163) 
As a result, for the intermediate state C (see Fig. 7b) we may write 




(8.164) 



At this stage, in this simple theoretical model, the coupling with environment is completely 
quenched (ahh, if this could be possible in reality! we would have quantum computers by now :-), and 
the source qubit is fed into one more Hadamard gate. Using Eqs. (138) again, for state D after this gate 
we get 

\D) = a 0 (cos^|0) + zsin^|l))|00) + a 1 (z'sin^|0) + cos^|l))|ll) . (8.165) 

Now the qubits are passed through the second, similar pair of CNOT gates - see Fig. 7b. Using Eq. 
(137), for the ket-vector of the resulting state E we readily get expression 

IE) = a 0 cos^|000) + a 0 z'sin^|lll) + a 1 z'sin^|01l) + a 1 cos^jlOO) , (8.166a) 

which evidently may be grouped as 

| E) = (a 0 1 0) + flj | l))cos <p 1 00) + (a, 1 0) + a 0 | l))i sin <p | 1 1) . (8.1 66b) 

This is already a rather remarkable result. It shows that if we measure the ancilla qubits at stage 
E, and both results corresponded to states 0, we may be 100% sure that the source qubit (which is not 
affected by the measurement!) is in its initial state even after the interaction with environment. The only 
result of an increase of this interaction (as quantified by the magnitude of phase (p) is the growth of the 
probability, 

W = sm 2 <p, (8.167) 

of getting the opposite result, which signals a dephasing-induced error in the source qubit. This implicit 
measurement, without disturbing the source qubit, is called quantum error detection. 

Even more impressive result may be achieved by adding to the circuit one more component, the 
so-called Toffoli (or "CCNOT") gate, denoted by the rightmost symbol in Fig. 7b. This 3-qubit gate is 
conceptually similar to the CNOT gate discussed above, besides that it flips the basis state of its target 
qubit only if both basis states of its two source qubits are 1 . (In the circuit shown in Fig. 7b, the former 
role is played by our source qubit, while the latter role, by two ancilla qubits.) According to its 
definition, the Toffoli gate has no effect on the first parentheses in Eq. (166b), but flips the source 
qubit's state in the second parentheses. The result may be factorized as follows, 





F)=(a 0 


0)+ a l 


l))(cos <p 


00 ) + i sin cp 





1 1 

giving the phase shifts described by Eq. (163), with (p = — | f{X]dt . Let me emphasize again that Eq. (162) is 



valid only if the interaction with environment is a pure dephasing, i.e. does not include the energy relaxation of 
the qubit or its thermal activation to the higher eigenstate - see Chapter 7. 
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showing that now the source qubit is again fully unentangled from the ancilla qubits. Moreover, 
calculating the norm squared of the second operand, we get 

(cos <p(00 1 - i sin <p(\ 1 |)(cos q>\ 00) + i sin <p\ 1 1)) = cos 2 q> + sin 2 <p = 1 , 

so that the final state of the source qubit always, exactly coincides with its initial state. This is the 
famous miracle of quantum state correction, taking place "automatically" - without any qubit 
measurements, and for any random phase shift (p. 

The circuit shown in Fig. 7b may be improved by adding the Hadamard gate pairs, similar to that 
used for the source qubit, to the ancilla qubits as well. If dephasing is small in the sense that the W given 
by Eq. (167) is much less than 1, this modified circuit may provide substantial error probability 
reduction (to ~W) even if the ancilla qubits are also subjected to a similar dephasing and the source 
qubits, at the same stage - i.e. between two Hadamard gates. The perfect automatic correction of any 
error (not only inner dephasing of a qubit and its relaxation/excitation, but also the mutual dephasing 
between qubits) of any used qubit needs even more parallelism. The first circuit of that kind, based on 9 
parallel qubits, which is a natural generalization of the circuit discussed above, had been invented in 
1995 by the same P. Shor. Later, 5-qubit circuits enabling similar error correction were suggested. (The 
further parallelism reduction has been proved impossible.) 

However, all these results assume that the error correction circuits as such are perfect, i.e. 
completely isolated from the environment. In the real world this cannot be done. Now the key question 
is what maximum level W mSLX of error probability in each gate (including those in the used error 
correction scheme) can be automatically corrected, thus opening a way toward large quantum computers 
producing some useful results - first of all, the factoring of large numbers - with at least 10 3 bits to be of 
interest for practice. To the best of my knowledge, this critical level has not yet been strictly calculated, 
partly because the error correction greatly inflates the number of the total gates in the system - by a 
factor crudely proportional to the number TV of used qubits. Various authors give broadly different 
estimates: from W max -10" to W max ~ 10" . Whatever the critical level is, it has not been reached yet. 

This situation has motivated the search for the quantum computation schemes different from that 
shown in Fig. 2; the most prominent alternative is called adiabatic quantum computation. 15 In its most 
actively pursued option (for which "quantum system modeling" would be a more appropriate name), the 
interaction between a system of qubits is organized so that the system's Hamiltonian is similar to that of 
some quantum system of interest. Then the qubit system, first prepared in a certain initial state with 
relatively high energy, e.g., in an unentangled state described by Eq. (130), is let to evolve on its own. 
Due to the unavoidable dissipation due to interaction with environment, the system eventually relaxes to 
a final unentangled state of its qubits, which is then measured. From numerous runs of such experiment, 
outcome statistics may be revealed for various temperatures of the environment. Thus, at this approach 
(which is very close to the numerical modeling technique called quantum annealing), the interaction 
with environment is allowed to play a certain role in the system evolution, though every effort is made 
to reduce it, to allow qubit "quantumness" to make a substantial difference at least at the beginning of 
the relaxation process. 



Uuantum 

(8.169) error , 

v J correction 



75 Note that qualifier "quantum" is important here, to distinguish this research direction from the option of 
classical adiabatic (or "reversible") computation - see, e.g., SM Sec. 3.3 and references therein. 
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Generally speaking, adiabatic quantum computation may be used for performing any quantum 
algorithm, including number factoring. 76 Unfortunately, due to technical difficulties of the organization 
and precise control of long-range interaction in multi-qubit systems, 77 the list of modeled systems is 
presently limited to a few simple ID or 2D arrays described by the so-called extended quantum Ising 
("spin-glass") model 1% 



where the curly bracket denotes the summation over pairs of close (though not necessarily closest) 
neighbors. Though Hamiltonian (170) is the traditional playground of phase transitions theory (see, e.g., 
SM Chapter 4), to the best of my knowledge there are not many practically valuable tasks that could be 
achieved by studying the statistics of its solutions. Moreover, even for this limited task, the speed of the 
best experimental adiabatic quantum "computer" with TV = 108 qubits is still lower than that of a 
classical, off-the-shelf semiconductor processor (with a dollar cost lower by some 6 orders of 
magnitude), and no dramatic change of this comparison is predicted for realistic larger values of N. 79 

There may be better prospects for another application of entangled qubit systems, namely for 
telecommunication cryptography. 80 The goal here is to replace the currently dominating classical 
encryption, based on the public-key RSA code mentioned above, that may be broken by factoring of 
very large numbers, by a quantum encryption that would be fundamentally unbreakable. The basis of 
this opportunity are the measurement postulate and the no-cloning theorem: if a message is carried out 
by a qubit such as a single photon, it is impossible for an eavesdropper (in cryptography, traditionally 
called Eve) to either measure or copy its faithfully, without also disturbing its state. However, as we 
have seen from the discussion of Fig. 7a, state quasi-cloning using entangled qubits is possible, so that 
the issue is far from being simple, especially if we want to use a publicly distributed quantum key, in 
some sense similar to the classical public key used at the RSA encryption. 

Unfortunately, I do not have time/space to discuss various options for quantum encryption, but 
cannot help demonstrating how counter-intuitive they may be, on the famous example of the so-called 
quantum teleportation (Fig. 8). 81 Suppose that party A (in cryptography, traditionally called Alice) 
wants to send party B (Bob) the full information about the quantum state a of a qubit, unknown to either 
party. Instead of sending her qubit directly to Bob, Alice asks him to send her one qubit (/?) of the pair 
of other qubits, prepared in a certain entangled state, for example in the singlet state (11): 



Using Eq. (125), the initial state of the whole 3-qubit system may be represented by the ket-vector 



76 See, e.g., the experiments on factoring of number 143 = 13x11, using nuclear spin relaxation, by N. Xu et al, 
Phys. Rev. Lett. 108, 130501 (2012), though by the moment of this writing, their results remained controversial. 

77 Due to the same reason, the implementation is so far limited to most scalable, Josephson-junction (flux) qubits 
- see, e.g., M. Johnson et al, Nature 473, 194 (201 1). 

78 For its classical version, see, e.g., SM Eq. (4.23) and its discussion. 

79 See S. Boxio et al, Nature Physics 10, 218 (2014) and T. Ronnow et al, arXiv:1401.2910 [quant-ph]. 

80 This field was pioneered in the 1970s by S. Wisener. 

81 This procedure had been first suggested in 1993 by the same C. Bennett, and then repeatedly demonstrated 
experimentally - see, e.g., the recent paper by L. Steffen et al, Nature 500, 319 (2013), and literature therein. 




(8.170) 




(8.171) 
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|«^') = (a 0 |0> + a 1 |l))^') = ^|00l)-^|010) + ^L|010>-^L|lll>, 

V2 v2 V2 v2 

which may be rewritten as a linear superposition, 

| a/3/3') = 1 1 afi)\ (- a, | 0) + a 0 1 1>) + 1 1 afi)] (a, | 0) + a 0 | 1» 

+ ||^>;(-a 0 |0> + a 1 |l»+||«^(-a 0 |0)-a 1 |l», 
of the following 4 states of qubit pair «/?: 

m: =^H)±ih))i i<?9»-i(ioi>±iio». 



Fig. 8.8. Sequential stages of a quantum 
teleportation procedure: (a) the initial state with 
entangled qubits f3 and B', (b) back transfer of 
qubit /?', (c) measurement of pair a/3, (d) forward 
transfer of 2 classical bits with the measurement 
result, and (e) the final state, with the state of 
qubit /3' mirroring the initial state of qubit a. 



After having received qubit (3 from Bob, Alice measures which of these 4 states does pair a/3 
have. This may be achieved, for example, by measurement of one observable represented by operator 
£.(a)fr(p) an( j anomer one corresponding to &^&[^- cf. Eq. (148). 82 The measured eigenvalue of the 

former operator enables distinguishing the couples of states (173) with different values of the lower 
index, while the latter measurement distinguishes the states with different upper indices. 

Then Alice reports the result (that may be coded by just 2 classical bits) to Bob over a classical 
channel. Since the measurement places pair a/3 definitely in the corresponding state, the remaining 
Bob's bit P' is now definitely in the unentangled single-qubit state that is represented by the 
corresponding parentheses in Eq. (172b). Note that each of these parentheses contains both coefficients 
a 0 ,i, i.e. the whole information about the initial state of qubit a had initially. If Bob likes, he may now 
use appropriate single-qubit operations, similar to those discussed above, to move qubit f3 into the state 
exactly similar to the initial state of qubit a. (This fact does not violate the no-cloning theorem (159), 
because the measurement has already changed the state of a.) This is of course a "teleportation" only in 
a very special sense of this rather ambiguous term, but a good example of the importance of qubit 
entanglement's preservation at their spatial transfer. For us, this is also a good primer for the 
forthcoming discussion of the EPR paradox and Bell's inequalities in Sec. 10.1. 



82 All four states (172) are eigenstates of both these operators, so that the measurements do not affect each other 
and may be done in any order. 
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(8.172b) 



(8.173) 
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Returning for a minute to practical quantum cryptography, since its two most common quantum 
key distribution protocols 83 require just a few simple quantum gates, whose experimental 
implementation is not a large obstacle, the main focus of the current effort is on decreasing single- 
photon dephasing in long optical fiber waveguides, 84 and hence increasing the maximum distance of 
quantum channels with sufficiently high qubit transfer fidelity. The recent progress was impressive, with 
demonstrated two lines (using either protocol) longer than 100 km, 85 and active plans for 560 km and 
700 km landlines and several satellite-based systems. Let me hope that if not the author, then the reader 
of these notes will see this technology accepted for practical secure telecommunications. 

8.6. Exercise problems 

8.1 . For the singlet state, and each triplet state of a two-electron system, evaluate the expectation 
value of the scalar product S1S2, neglecting the direct spin-spin interaction. Compare the result with the 
scalar product of two classical vectors of magnitude h/2 each, being either parallel or antiparallel. 

8.2 . Use the perturbation theory to calculate the so-called hyperfine splitting of the ground 
energy of the hydrogen atom, due to the interaction between spins of the nuclei (proton) and of the 
electron. 

Hint: Proton's magnetic moment is described by a relation similar to Eq. (4.116), but with the 
positive sign, a very different g-factor, g p « 5. 5 86, 86 and of course a different mass, m p « 1.673xl0" 27 kg. 

8.3 . Discuss the coefficients +1/V2 that participate in Eqs. (8.19) and (8.21), in terms of the 
Clebsh-Gordan elements (see Sec. 5.7). 

8.4 . Write down the simplest model Hamiltonians of the following systems, in terms of the 
second quantization formalism: 

(i) a system of two weakly coupled quantum wells, taking into account pair on-site interactions 
(additional energy J per each pair of particles in the same quantum well), and 

(ii) same for the motion in a periodic ID potential, in the tight-binding limit. 

8.5 . For each of the Hamiltonians composed in Problem 4, derive the Heisenberg equations of 
motion for particle creation operators, for (i) bosons, and (ii) fermions. 



83 BB84 suggested in 1984 by C. Bennett and G. Brassard, and EPRBE suggested in 1991 by A. Ekert. For 
details, see, e.g., either Sec. 12.6 in Nielsen and Chuang, or the review by N. Gizin et al., Rev. Mod. Phys. 74, 145 
(2002). 

84 For their discussion see, e.g., EM Sec. 7.8. 

85 See P. Hiskett et al, New J. Phys. 8, 1 93 (2006), and R. Ursin et al, Nature Physics 3, 48 1 (2007). 

86 The positive sign may be readily interpreted as the result of the positive electric charge q = e of the proton, 
while the anomalously large experimental value of its g-factor may be qualitatively understood as a result of the 
three-quark structure of this composite particle. (The exact quantitative calculation of g p still remains a challenge 
for quantum chromodynamics.) 
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8.6 . Explain in detail why the Hartree-Fock approximation (118), applied to the helium atom, 
gives "correct" 87 expressions (31) for the ground singlet state, and Eqs. (44)-(45) with the minus sign in 
Eq. (44), for the excited triplet state, but cannot describe result (44)-(45) with the plus sign, for the 
excited singlet state. 

8.7 . Find a time-independent Hamiltonian that may cause the qubit evolution described by Eq. 
(147). Discuss the result and its relation to the time-dependent Hamiltonian (6.86). 



Correct in the sense of the 1 st order of the perturbation theory. 
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Chapter 9. Introduction to Relativistic Quantum Mechanics 

This chapter gives a brief introduction to relativistic quantum mechanics. It starts with a discussion of 
the basic elements of the quantum theory of electromagnetic field (quantum electrodynamics, QED), 
including the quantization scheme, photon statistics, radiative atomic transitions, the spontaneous and 
stimulated radiation, and the so-called cavity QED. Then I will briefly review the relativistic quantum 
theory of particles with nonvanishing rest mass, notably Dirac ' theory of spin-'A particles, and mark the 
point of entry into the most complete relativistic quantum theory - the quantum field theory, QFT - 
which is beyond the scope of these notes. 1 



9.1. Electromagnetic field quantization 

Classical mechanics tells us 2 that the relativistic relation between momentum p and energy E of 
a free particle with rest mass m may be simplified in two limits, non-relativistic and ultra-relativistic: 



Free 
particle's 
relativistic 
energy 



[> x 2 , 9, ili" \mc 2 + p 2 /2m, for p « mc, 
E = [(pc) 2 +(mc 2 ) 2 \ -> ' 

I pc, tor p » mc. 



(9.1) 



In both limits, the transfer from classical to quantum mechanics is easier than in the arbitrary case. Since 
all the previous part of this course was committed to the first, non-relativistic limit, I will now jump to a 
brief discussion of the ultra-relativistic limit p » mc, for a particular but very important system - the 
electromagnetic field. Since the excitations of this field, called photons, are currently believed to have 
zero rest mass m, 3 the ultrarelativistic limit is valid for any photon energy E, and the quantization 
scheme is rather straightforward. 

As usual, the quantization has to be based on the classical theory of the system, in this case the 
Maxwell equations. As the simplest case, let us consider electromagnetic field in a free-space volume 
limited by ideal walls that reflect incident waves perfectly. 4 Inside the volume, the Maxwell equations 
may be reduced to a simple wave equation 5 for electric field 



VV-^|# = 0, (9.2) 
c ot 



and an absolutely similar equation for magnetic field 3. We may look for the general solution of Eq. (2) 
in the variable-separating form 



1 Note that some material of this chapter is frequently taught as a part of the QFT. I will focus on a few most 
important results that may be obtained without starting heavy QFT engines. 

2 See, e.g., EM Chapter 9. 

3 By now this fact has been verified experimentally with an accuracy of at least ~10" 22 m e - see S. Eidelman et al, 
Phys. Lett. BS^1,\ (2004). 

4 In the case of finite energy absorption in the walls, or in the wave propagation media (say, described by complex 
constants s and //), the system would not be energy-conserving (Hamiltonian), i.e. would interact with the 
dissipative environment. Specific cases of such interaction will be considered in Sections 2 and 3 below. 

5 See, e.g., EM Eq. (7.3), for the particular case s = So, ju = jUq, v 2 = l/s/j = 1/sqjUq = c 2 . 
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^(r,0 = X^(0e 7 (r). 



(9.3) 



Physically, each term of this sum is a standing wave whose spatial distribution and polarization 
("mode") is described by vector function e 7 (r), and the temporal dynamics, by function p/t). Plugging an 
arbitrary term of this sum into Eq. (2), and separating variables exactly as we did, e.g., for the 
Schrodinger equation in Sec. 1.4, we get 



vV 



\_'P_ 

c 2 p 



const 



so that the spatial distribution of the mode satisfies the 3D Helmholtz equation: 



V 2 e 7+ £> 7 



0. 



(9.4) 



(9.5) 



Equation 

for 

field 

distribution 



The set of solutions of this equation, with appropriate boundary conditions, determines the set of 
functions e 7 and simultaneously the spectrum of wave number moduli kj. The latter values determine 
mode eigenfrequencies, following from Eq. (4): 



Pi + co^Pj = 0, with co j = hp 



(9.6) 



There is a big philosophical difference between the approaches to equations (5) and (6), despite 
their single origin (4). The first (Helmholtz) equation may be rather difficult to solve in realistic 
geometries, 6 but it remains intact in quantum theory, with the scalar components of vector functions 
e/r) still treated (at each point r) as c-numbers. In contrast, Eq. (6) is readily solvable (giving sinusoidal 
oscillations with frequency C0j), but this is exactly where we can make a transfer to quantum mechanics, 
because we already know how to quantize a mechanical ID harmonic oscillator that obeys, in classics, 
the same equation. 

As usual, we need to start with the appropriate Hamiltonian corresponding to the classical 
Hamiltonian function H of the proper set of generalized coordinates and momenta. The electromagnetic 
field's Hamiltonian function (that in this case coincides with field's energy) is 7 



.2 \ 



+ ■ 



Let us represent the magnetic field in a form similar to Eq. (3), 

3{r,t) = -Y J co.q.(t)h J (r) 



(9.7) 



(9.8) 



Since, according to the Maxwell equations, in our case the magnetic field satisfies the equation similar 
to Eq. (2), the time-dependent amplitude qj of each of its modes obey the equation similar to Eq. (6), i.e. 
also changes in time sinusoidally, with the same frequency coj. Plugging Eqs. (3) and (8) into Eq. (7), we 
may recast it as 



6 See, e.g., various problems discussed in EM Chapter 7, especially in Sec. 7.9. 

7 See, e.g., EM Sec. 9.8, in particular, Eq. (9.225). I am using use SI units, with sqjuq = c' 2 ; in the Gaussian units, 
coefficients So and /Jq disappear, but there is an additional common factor \l4nm the equation for energy. If we 
modify the normalization conditions accordingly, all the subsequent results look similar in any system of units. 
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(9.9) 



Since the distribution of constant factors between two multiplication operands in each term of Eq. (3) is 
arbitrary, we may fix it by requiring the first integral in Eq. (9) to equal 1 . It is straightforward to check 
that according to the Maxwell equations, which give a specific relation between vectors 3 and this 
normalization makes the second integral in Eq. (9) equal 1 as well, and Eq. (9) becomes 



2 2 2 



(9.10) 



Electro- 
magnetic 
mode's 
Hamiltonian 



Now we can carry out the standard quantization procedure, namely declare Hj, pj, and qj the 
quantum-mechanical operators related exactly as in Eq. (10), 



(9.11) 




we see that this Hamiltonian coincides with that of a ID harmonic oscillator with the mass m 7 formally 
equal to l, 9 and the eigenfrequency equal to C0j. Now, in order to plug Eq. (11) into Eq. (4.199) for the 
time evolution of Heisenberg-picture operators p. and^ y , we need to know the commutation relation 

between these operators. For that, returning to the classical case, let us calculate the Poisson bracket 
(4.204) for "functions" A = q r and B = py. 



8q r dp f , 8q f dp f , 
Bp j dq. 8qj dp j 



(9.12a) 



Since in the classical Hamiltonian mechanics, all generalized coordinates qj and momenta pj have to be 
considered independent arguments of H, only one term (with j =j'=j") in only one sum (12) (with j' = 
j"), gives a nonvanishing value (-1), so that 



{q r ,p f }=-5 jr . 



(9.12b) 



Hence, according to the general quantization rule (4.205), the commutation relation of the operators 
corresponding to gyand pj» is 



\q j „p j \ = m j 



(9.13) 



i.e. is exactly the same as for the usual Cartesian components of the radius-vector and momentum of a 
mechanical particle. 

As the reader already knows, Eqs. (1 1) and (13) open for us several alternative ways to proceed: 



8 See, e.g., EM Eq. (7.6). 

9 With different normalizations of functions ey(r) and b 7 (r), we could readily arrange any value of ntj, and the 
choice corresponding to ntj = 1 is the best one just for the notation simplicity. Note also that I am using notation qj 
instead of Xj for the generalized coordinate of the field oscillator, in order to emphasize the difference between the 
former variable, defined by Eq. (8), and one of the Cartesian coordinates, i.e. one of arguments of c-number 
functions e and b. 
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(i) Use the Schrodinger-picture wave mechanics based on wavefunctions ^(qj, t). As we know 
from Sec. 2.10, this way is inconvenient for most tasks, because eigenfunctions of the harmonic 
oscillator are rather clumsy. 

(ii) A substantially better way is to write the equations of time evolution of the Heisenberg- 
picture operators q, (t) and p, (t) . 

(iii) An even more convenient approach is to use equations similar to Eqs. (5.99) to decompose 

operators qXt) and p-(t) into the creation-annihilation operators a J and a,, and work with these 
operators using either the Schrodinger or the Heisenberg picture, depending on the problem. 

I will mostly use the last route. Replacing m with mj =1, and coq with coj, the last forms of Eqs. 
(5.98) become 



^co ^ 



1/2 



a. = 



9 j + z ' 



Pj 



CO 



1/2 



i J 



9j ~ l 



P 



CO 



(9.14) 



J J 



and due to Eq. (13), the creation-annihilation operators obey the commutation similar to Eq.(5.101), 

a p a]] = IS M ,, (9.15) 



so that, according to Eqs. (3) and (8), the quantum-mechanical operators corresponding to the electric 
and magnetic fields are 



flCO : 

v 2 y 



e,(r) I a] -a, I, 



hco : 

. 2 y 



bj(r)\ a]+Uj |, 



and Eq. (1 1) for j u mode's Hamiltonian becomes 



Hj = HtOj 



a] a , + —1 
j j --) 



= hco. 



n,+—I 
1 2 



, with n , = a'j Qj 



(9.16a) 



(9.16b) 



Electro- 
magnetic 
fields' 
operators 



(9.17) 



absolutely similar to Eq. (5.505) for a mechanical oscillator. 

Now comes a very important conceptual step. From Sec. 5.4 we know that eigenstates (Fock 
states n/) of Hamiltonian (17) have energies 




and, according to Eq. (5.115), operators a) and a, act on the eigenkets of these states as 

& j \ n j} = \ n J )' /2 1 h j ~ l )' \ n j } = ( n J + l f n \ h j + ! ) ' 



Electro- 
magnetic 
(9.18) mode's 
eigen- 
energies 



(9.19) 
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regardless of the quantum states of other modes (frequently called field oscillators). These rules 
coincide with definitions (8.56) and (8.60) of bosonic creation-annihilation operators, and hence their 
action may be considered as the creation/annihilation of certain bosons. Such a "particle" (actually, an 
excitation of an electromagnetic field oscillator) is exactly what is, strictly speaking, called a photon. 
Note immediately that according to Eq. (16), such an excitation does not change the spatial distribution 
of the / h mode of the field. So, such a "global" photon is an excitation created simultaneously at all 
points of the field confinement region. 

If this picture is too contrary to the intuitive image of a particle, please recall that we had a 
similar situation in Chapter 2 with eigenstates of the nonrelativistic Schrodinger equation: the 
represented a standing de Broglie wave existing simultaneously in all points of the particle confinement 
region. The (partial :-) reconciliation with the classical picture of a moving particle might be obtained by 
using the linear superposition principle to assemble a quasi-localized wave packet of sinusoidal waves, 
with close wave numbers. Very similarly, we may form a quasi-localized wave packet using a linear 
superposition of the "global" photons with close values of kj (and hence <x>j). An additional simplification 
here is that since the dispersion relation for electromagnetic waves is linear: 

dco, d 2 a> ; 

— L = c = const, i.e. f = 0, (9.20) 

dkj dk/ 

so that, according to Eq. (2.39a), the electromagnetic wave packets (localized photons) do not spread out 
during their propagation. 

The next important conceptual issue is that of the ground-state energy. Equation (18) implies that 
the total ground-state (i.e., the lowest) energy of the field is 



Ground- 
state 
energy 
of the field 



„ „ hco, 

j j z 



(9.21) 



This sum diverges at high frequencies for any realistic any realistic model of the field-confining volume 
- either infinite or not. Any attempt to dismiss this paradox by declaring the zero-point energy 
unobservable and hence non-existing fails due to several experimental facts. 

First of all, the ground-state "fluctuations" (sometimes called "quantum noise") can be directly 
observed - see Sec. 7.5 and in particular the literature cited therein. Second, there is the Casimir 
effect. 10 The simplest manifestation of the effect involves two parallel plates separated by a vacuum gap 
of thickness d « A , where A is the plate area (Fig. 1). Rather counter- intuitively, the plates attract 
each other with a force proportional to area A, and rapidly increasing at the decrease of gap d. 



{ 



d 



m^m^^ manifestation 



Fig. 9. 1 . Generic geometry of the Casimir effect 



10 It was predicted in 1948 by H. Casimir and D. Polder, and confirmed semi-quantitatively in experiments by M. 
Sparnaay, Nature 180, 334 (1957) and others. A decisive error bar reduction (to about -5%), providing a 
quantitative confirmation of the Casimir formula (23), was achieved by S. Lamoreaux, Phys. Rev. Lett. 78, 5 
(1997) and U. Mohideen and A. Roy, Phys. Rev. Lett. 81, 004549 (1998). 
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The effect's explanation is that the energy of each the electromagnetic field mode, including the 
ground-state energy, is intimately related with pressure, 



d \ E J, 

P, = — (9.22) 

exerted by the field on the walls constraining it to volume V. While its pressure on the external surfaces 
on the plates is due to sum (21) over all free-space modes, with arbitrary values of k z (the z-component 
of the wave vector ky), between the plates the spectrum of k- is limited to multiples of nld, so that the 
pressure on the internal surfaces is lower. The net pressure may be found as the sum of contributions 
(22) from all "missing" low-frequency modes in the gap. The calculations are rather simple if the plates 
are made of an ideal conductor (which provides boundary conditions E n = 0 and B T = 0 on the plate 
surfaces), and the result is 11 




(9.23) 

Note that for this summation, the high-frequency divergence of Eq. (21) at high frequencies is 
not important, because it participates in the forces exerted on all surfaces of each plate, and hence 
cancels out from the net pressure. In this way, the Casimir effect not only gives a confirmation of Eq. 
(21), but also teaches us an important lesson how to deal with the divergence of this sum at coj — > oo: just 
get accustomed to the idea that the divergence exists and ignore the fact while you can. However, for 
more complex tasks of quantum electrodynamics (and quantum theory of any other field) this approach 
becomes impossible, and then more complex, renormalization techniques become necessary. For their 
study, I have to refer the reader to a quantum field theory course - see the literature cited in the end of 
this chapter. 



Casimir 
effect 



9.2. Photon statistics 

As a matter of principle, the Casimir effect may be used to measure not only the free-space 
electromagnetic field, but also that arriving from local sources - lasers, etc. However, usually this is 
done by simpler detectors in which the absorption of a photon by a single atom leads to its ionization. 
This ionization, i.e. emission of a free electron, triggers a chain reaction (i.e., an electric discharge in a 
Geiger-type counter) that may readily be registered by appropriate electronic circuitry. In order to 
discuss the statistics of such photon counts, it is sufficient to consider the field interaction with just one, 



11 For realistic metals, the reduction of d below ~1 um causes significant deviations from this simple model, and 
hence Eq. (23). The reason is that at the important frequencies co ~ eld, the depth of field penetration into the 
metal (see, e.g., EM Sees. 2.1 and 6.2) becomes comparable with d, and a theory of the Casimir effect has to 
involve a certain model of field penetration. (It is curious that in-depth analyses of this problem, pioneered in 
1956 by E. Lifshitz, have revealed a deep relation between the Casimir effect and the long-range London 
dispersion forces which were the subject of Problems 3.7, 5.10 and 6.8 - for a review see, e.g., either I. 
Dzhyaloshinskii et ah, Sov. Phys. Uspekhi 4, 153 (1961), or K. Milton, The Casimir Effect, World Scientific, 
2001.) Recent experiments in the 100 nm - 2 um range of distances d, with accuracy better than 1%, allowed even 
to distinguish the difference between alternative approximate models of field penetration - see D. Garcia-Sanchez 
et al, Phys. Rev. Lett. 109, 027202 (2012). 
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"trigger" atom. The atom's size a is typically much smaller that the radiation wave length Aj = 2nlkj, so 
that their interaction is adequately described in the electric dipole approximation, 



(9.24) 



where d is the dipole moment's operator. 12 In Sec. 6.5 we have already developed an approach suitable 
for the analysis of this problem, based on the Golden Rule - see Fig. 6.14 and Eq. (6. 152). 13 In our 
current case, we may associate system b with the "trigger atom" (whose ionized states form a continuum 

spectrum), and hence operator d in Eq. (24) with operand B in Eq. (6.148), while the electromagnetic 

field is represented by system a, and its electric field operator & is associated with operand A in that 
relation. Let us assume, for simplicity, that our field consists of only one mode e/r). 14 Then we can 
keep only one term in Eq. (16a), and drop index j, so that Eq. (6.152), for the transition from certain 
initial state labeled "ini" to a final state "fin" may be rewritten as 



r = 



In 



h 

2n fico 
TT 



(fin|^(r,0|ini)| (fin|d • njini) p f 



fin II a' - a |e(r)| ini 



(9.25) 



fin r • n I ini 



where e(r) is the local magnitude of vector e(r), and n e = e(r)/e(r) is its local direction. 15 As a reminder, 
in the Heisenberg picture of quantum mechanics, the initial and final states are time-independent, while 
the creation-annihilation operators are functions of time. In this Golden Rule formula, as in any 
perturbation result, this time dependence has to be calculated ignoring the perturbation - in this case the 
field-atom interaction. For the field's creation-annihilation operators, this dependence coincides with 
that of the usual ID oscillator - see Eq. (5.171), in which coo should be now replaced with co: 



Hence Eq. (9.25) becomes 



T = nco 



a(t) = a(0)e' 



fin J a\0)e iM -a(0)e~ icot Wr)|ini) (fin|d(f) • njini) * p f 



iM , a\t) = a\0)e +i(Ot . 



(9.26) 



(9.27a) 



Now let us multiply the first bra-ket by exp{icot}, and the second one by exp{-z'<2#}: 



12 As a reminder: this relation, with the single-particle expression d = qr, has already been used several times - 
see, e.g., Eqs. (6.32) and (6.149). In contrast to the former of those cases, now we have to account for the 
quantum nature of the electromagnetic field £, so in Eq. (24) it is represented by the (vector) operator (16a). 

13 Please note that (as was promised) we have gradually slipped to the analysis of open, irreversible systems, with 
the detector(s) playing the role of a continuous-spectrum environment for the quantized electromagnetic field. 

14 In a multimode field, the modes are typically incoherent, so that the total transition rate may be calculated as 
the sum of the partial rates of each mode - as we will do for a certain case below. 

15 By the way, this expression shows that for the single-particle transitions from the ground state to « th Fock state, 
the absorption rate is indeed proportional to the oscillator strength f„ = (2m/h 2 )(E n - E 0 ) |(«|x|0)| 2 of the transition, 
where x is particle's coordinate in the direction of the external field. As was discussed in Chapter 5, the strengths 
obey the sum rule X,/„ = 1 . 
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T = 710) 



fin 



a\0)e 2iat 



a(0) e(r) ini 



fin|d(0-n e e~ z ^|ini 



P 



fin 1 



(9.27b) 



The physical sense of this, mathematically trivial, operation is that at resonant photon absorption, only 
the annihilation operator gives a significant time-averaged contribution to the first bra-ket matrix 
element. (Similarly, according to Eq. (4.199), the Heisenberg operator of the dipole moment, 
corresponding to the increase of atom's energy, has only the Fourier components that differ from co 
only by ~T « co, so that its time dependence compensates the additional factor in the second bra-ket of 
Eq. (27b), so that this bra-ket is also frequency-independent and has a substantial time average.) Hence, 
we can neglect the fast-evolving term in the first bra-ket whose average over time interval ~1/T is very 
close to zero. 16 

Now let us assume that we use the same detector, characterized by the same second bra-ket and 
the same state density p/, for measurement of various electromagnetic fields - or just the same field at 
different points r. Then we are only interested in the behavior of the first, field-related factor, and may 
write 

r oc |^fin|ae(r)|ini)| = (fin|ae(r)|ini^fin|ae(r)|ini) =^ini|a^e (r)| fin^fin | ae(r)| ini), (9.28) 

where the creation-annihilation operators are assumed to be taken in the initial moment (i.e., in the 
Schrodinger picture), and the initial and final states are those of the field alone. As we know, any ID 
harmonic oscillator (and hence the electromagnetic field oscillator) has many equidistant levels, so even 
if it initially was in a certain state, it may undergo be several coherent transitions to different finite Fock 
states. If we want to calculate the total rate, we may sum the transition rates into all finite states. Then, 
since these states form a full and orthonormal set, we may use the closure condition (4.44) to get 




(9.29) 



Let us apply this formula to several possible quantum states of the field mode. 

(i) First, as a sanity check, the ground initial state (n = 0) gives no photon counts at all. The 
interpretation is easy: the ground state cannot emit a photon that would trigger an atom in the counter. 
Again, this does not mean that the ground-state motion is not observable (if you still think so, please 
review the Casimir effect discussion in the last section), just that it cannot ionize an atom in the detector 
- because it does not have any spare energy for doing that. 

(ii) All other coherent states (Fock, Glauber, squeezed, etc.) of the field oscillator give the same 
counting rate, provided that their (n) is the same. This result may be less evident if we apply Eq. (29) to 
an interference of two light beams from the same source (say, in the double-slit or the Bragg-scattering 
configurations). In this case we may present the spatial distribution of the field as a sum 



Photon 

counting 

rate 



e(r) = ei (r) + e 2 (r). 



(9.30) 



Here each term describes one possible wave path, so that the field product in Eq. (29) may be a rapidly 
changing function of the detector position. For this configuration, our result (29) means that the 



16 This is essentially the same rotating wave approximation (RWA) which was already used in Sec. 6.3 - see the 
transition from Eq. (6.90) to the first of Eqs. (6.94). 
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interference pattern (and its contrast) are independent of the particular state of the electromagnetic 
field's mode. 

(iii) Surprisingly, the last statement is also valid for a classical mixture of the different 
eigenstates of the same field mode, for example for its thermal-equilibrium state. Indeed, in this case we 
need to average Eq. (29) over the corresponding classical ensemble, but it would only result in a 
different meaning of averaging n in that equation; the field part describing the interference pattern is not 
affected. 

The last result may look a bit counter-intuitive, because common sense tells us that the 
stochasticity associated with thermal equilibrium has to suppress the interference pattern contrast. These 
expectations are (partly :-) justified, because a typical thermal source of radiation produces many field 
modes j, rather than one mode we have analyzed. These modes may have different wave numbers kj and 
hence different field distribution functions e 7 (r), resulting in shifted interference patterns. Their 
summation would indeed smear the interference, suppressing its contrast. 

So the use of a single photon detector is not a suitable way to distinguish different quantum 
states of an electromagnetic field modes. This task, however, may be achieved using the photon 
counting correlation technique shown in Fig. 2. 17 



light 
source 



semi-transparent 
mirror i*, 



detector 2 




detector 1 




count 
statistics 
calculation 



controllable 
time delay 



Fig. 9.2. Photon counting correlation 
measurements. (The intensities of the 
split beams should be comparable, but 
not necessarily equal.) 



Second- 
order 
correlation 
function: 
definition 



In this experiment, the counter rate correlation may be characterized by the so-called second- 
order correlation function 1 * of the counting rates, 



(9.31) 




17 It was pioneered as early as in the mid-1950s (i.e. before the advent of lasers!), by R. Hanbury Brown and R. 
Twiss. Their first experiment was also remarkable for the rather unusual light source they used - star Sirius! (It 
was a part of an attempt to improve astrophysics interferometry techniques.) 

18 The reader may be interested what is the first-order correlation function. It is usually defined as 

1/2 



g (1) (r) = Ih^tWi^t-T) )/ 



£( ri ,t)£\r„t) 



3(r 2 ,t)3\r 2 ,t) 



In the single-mode case, and the rotating-wave approximation, the function is proportional to the c-number 
product e(ri)e (r 2 ), with all creation-annihilation operators cancelled, i.e. is suitable for characterizing 
interference patterns (30), but not the quantum state of the electromagnetic field. 
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where the averaging may be carried out either over many similar experiments, or over time t, due to the 
ergodicity of the experiment (with a stationary light source). Using the normalized correlation function 
(31) is very convenient, because characteristics of the detectors and beam splitter drop our from this 
fraction. 

Very unexpectedly for the mid-1950s, Hanbury Brown and Twiss discovered that the correlation 
function depends on time delay rin the way shown schematically by the solid line in Fig. 3. It is evident 
from Eq. (31) that if the counting events are completely independent, g (2) (r) should be equal 1 - which is 
always the case in the limit r — > qo. Hence, the observed behavior at r — > 0 corresponds to the positive 
correlation of detector counts at small time delays, i.e. to a higher probability of the nearly-simultaneous 
arrival of photons to both counters. This effect is called the photon bunching. 



(2) 



2 
1 



n = \ 



Fig. 9.3. Photon bunching (solid line) and 
antibunching for various n (dashed lines). The 
lines approach level g (2) = 1 at r — > °o (on the 
time scale depending on the light source). 



Let us use our simple single-mode model to analyze this experiment. Now the elementary 
quantum process, characterized by the nominator of Eq. (31), is the correlated triggering of two 
counters, at two spatial-temporal points {ri, t} and (r 2 , t - r}, by the same field mode, so that we need to 
make the following replacement, in the first of Eqs. (25): 

&{x, t) -> const x <£(r, , t)£(r 2 ,t-r). (9.32) 

Repeating all the manipulations done in the single-counter case, we get 

(r, (t)T 2 (t - t)} oc (ini \a(t)U(t-r)U(t- r)a(t)\ ini) e* (r, )e* (r 2 )e(r l )e(r 2 ). (9.33) 

Plugging this expression, as well as Eq. (29) for single-counter rates, into Eq. (31), we see that the field 
distribution factors (as well as the detector-specific bra-kets and the density of states pf m ) cancel, giving 
a very simple final expression 

la\t)a\t-T)a{t-T)a{t)\ 
g (2) (r)=\ — - 2 i, (9.34) 

(a T (0«(0) 

where the averaging should be carried out, as before, over the initial state of the field. Still, the 
calculation of this expression for arbitrary z may be quite complex, because the relaxation of the 
correlation function to the asymptotic value g (2) (oo) in many cases is due to the interaction of the light 
source with environment, and hence requires the open-system techniques which were discussed in 
Chapter 7. However, the zero-delay value g (2) (0) may be calculated in a straightforward way, because 
the time arguments of all operators are equal, so that we may write 
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Zero- 
delay 
correlation 



g (2) (0) 



a 1 a 1 aa) 

-t<A 2 

a ] a) 



(9.35) 



Let us evaluate this ratio for the simplest states of the field. (Remember, we are working in the 
Schrodinger picture now.) 

(i) n th Fock state . In this case, it is convenient to act by the annihilation operators upon the ket- 
vectors, and by the creation operators, upon the bra-vectors, using Eq. (19): 



Photon 
anti- 
bunching 



g (2) (0) 



n\a^ a' aa\n) (n -2\[n(n -Y)] U2 [n(n -l)] 1/2 |n -2) n{n-X) 1 



a 1 a 



n — \\n 



1 



(9.36) 



We see that the correlation function at small delays is suppressed rather than enhanced - see the dashed 
line in Fig. 3. This photon antibunching effect has a very simple explanation: a single photon emitted by 
the wave source may be absorbed by just one of the detectors. For the initial state n = 1, this is the only 
option, and it is very natural that Eq. (36) predicts no simultaneous counts at r = 0. Despite this 
theoretical simplicity, reliable observations of the antibunching have not been carried out until 1977, 19 
due to the experimental difficulty of creating Fock states of electromagnetic field oscillators - see Sec. 4 
below. 

(ii) The Glauber state a . A similar procedure, but now using Eq. (5.155) and its Hermitian 
conjugate, (a \a ' = (a \a , yields 



Glauber 
field 
statistics 



g (2) (0) = 



I a T T a a I 

a\a ] a ] aa\a 



a 



d^d 



a 



* * 
a a aa 

* 2 

(a a) 



(9.37) 



for any parameter a. We see that the result is very different result from the Fock states, unless in the 
latter case n — > qo. (We know that the Fock and Glauber properties should also coincide for the ground 
state, but at that state the correlation function's value is uncertain, because there are no photon counts at 
all.) 

(iii) Classical mixture . From Chapter 7, we know that such ensembles cannot be described by 
single state vectors, and require the density matrix w for their description. In particular, we can use the 
key Eq. (7.5) to write 



g (2 \0) 



m j A /vf Af AA I 

__ 1 rywa a aa) 
\lr{wd 'a)] 2 



(9.38) 



The calculation is easy for an ensemble in thermodynamic equilibrium, because here the density 
matrix is diagonal in the basis of Fock states n - see Eqs. (7.23)-(7.25): 



19 H. J. Kimble et ah, Phys. Rev. Lett. 39, 691 (1977). For a detailed review of phonon antibunching, see, e.g., H. 
Paul, Rev. Mod. Phys. 54, 1061 (1982). 
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x n , , i ha 

where X = exp 



«=0 



(9.39) 



So, for the operators in the nominator and denominator of Eq. (38) we also need just the diagonal terms 
of the operator products that have already been calculated - see Eq. (36). As a result, we get 



g (2, (0) = 



n=0 



n=0 



B=0 



Hw m n 



V«=o J 

One of these sums is just the geometric progression, 



V«=o J 



f oc 



(9.40) 



n=0 1 



■A 



(9.41) 



and the remaining two sums may be readily calculated by its differentiation over parameter X: 

d ^ „„ „ d 1 X 



^X"n = X^ j X" x n = X — ^ X" = X = — 

n =o n =o dX „ =0 dX\-X (1 



Y J A"n(n-l) = X 2 ^X"- 2 n(n-l) = X 2 -^- J^X" 



d 2 2 d 2 1 



n=0 



n=0 



2X 2 



(9.42) 



V«=o J 



dX 2 \-X (1-/1) 3 



and for the correlation function we get an extremely simple result independent of parameter X and hence 
of temperature: 



g (2) (0) 



_ [2X 2 I{\-Xf][\l{\-X)\ _ 

[xi(\-xy 



(9.43) bunching 



This is the exactly the photon bunching effect first observed by Hanbury Brown and Twiss (Fig. 
3). We see that in contrast to antibunching, this is an essentially classical (statistical) effect. Indeed, Eq. 
(43) allows a purely classical proof. In the classical theory, the counting rate is proportional to the wave 
intensity /, so that Eq. (3 1) is reduced to 



g (2 \0)- 



r 



, with / oc E 2 (t) oc E m E 



CO" (O * 



(9.44) 



For a sinusoidal field, the intensity is constant, and g (2) (0) = 1. (This is also evident from Eq. (37), 
because the classical state may be considered as the Glauber state with a — > qo.) On the other hand, if 
intensity fluctuates (either in time, or from one experiment to another), the averages should be 
calculated as 



I N ) = jw(I)I N dI, with \w(l)dl = l, 



(9.45) 
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where w(T) is the probability density. For the classical (Boltzmann) statistics, the probability is an 
exponential function of the electromagnetic field energy, and hence its intensity: 



w(I) = Ce pI , where/? ocl/A; B r, 



(9.46) 



so that Eqs. (48) yield: 

OO 

J C exp{- pl}dl = 1, so that C = /?, 

0 

Plugging these results into Eq. (44), we get g (2) (0) = 2, in a complete agreement with Eq. (43). 20 



(9.47) 



9.3. Spontaneous and stimulated emission 

In our simple model for photon counting, considered in the last section, trigger atoms of the 
photon counter absorbed light. Now let us have a look at the opposite process of spontaneous emission 
of photons by an atom in an excited state, still using the same electric-dipole approximation for the 
atom-to-field interaction. We may still use the Golden Rule for the model depicted in Fig. 6.14, but now 
the roles have changed: we have to associate operator A with the dipole moment of the atom, while 

operator B with the electric field, and the continuous spectrum of system b represents the plurality of 
the electromagnetic field modes into which the spontaneous radiation may happen. Since now the 
transition increases the energy of the electromagnetic field, after the multiplication of the field bra-ket 
by exp{z'ct#}, we may keep only the photon creation operator whose time evolution compensates this fast 
"rotation". As a result, the Golden Rule takes the following form: 



Spontaneous 
photon 
emission 
rate 



r 



nco 



fin la 1 1 0 



^ 1 7 

fin |d • e(r)| ini) p 



fin ' 



(9.48) 



where all operators and states are time-independent, and p? m is now the density of finite states of the 
electromagnetic field - which in this problem plays the role of atom's environment. Here the 
electromagnetic field has been assumed to be initially in the ground state - the assumption that will be 
altered later in this section. 

Relation (48), together with Eq. (19), shows that in order for field's matrix element be different 
from zero, the finite state of the field has to be the first excited Fock state, n = 1 . (By the way, this is 
exactly the most practicable way of generating an excited Fock state of a field oscillator field - whose 
existence was taken for granted in our discussion in Sec. 2.) With that, Eq. (48) yields 



nco 



fin Id -e(r)|ini 



P 



fin 



nco 



fin | de d (r)| 



ini 



P 



tin ' 



(9.49) 



20 For some field states, including the squeezed ground states £ discussed in the end of Sec. 5.5, values g { (0) may 
be even higher than 2 - the so-called super-bunching. Analysis of one particular case of super-bunching is offered 
to the reader - see the exercise problem list. 
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where the density pr m of excited electromagnetic field states should be calculated at energy hco, and is 
the component of the vector e(r) along the electric dipole direction. 21 For plane waves, the calculation of 
this density was our first step in this course - see Eq. (1.1). 22 From it, we get 



y T , co 

Psn= ~dE~ ~dE~ VhS' 



(9.50) 



where the bounding volume V should be large enough to ensure spectrum's virtual continuity. Because 
of that, in the normalization condition used to simplify Eq. (9), we may consider e 2 (r) constant. Let us 
present the square of this vector as a sum of squares of its three perpendicular components (one of those, 
e c i, aligned with the dipole direction), due to space isotropy we may write 



e 2 = e 2 + e 2 +e 2 = 2e 2 



As a result, the normalization condition yields 



(9.51) 



3s 0 V 



(9.52) 



and Eq. (49) gives the famous (and very important) formula 23 



1 4af 



4ns Q 3hc 



fin d ini 



1 4oj 3 
4ns ^ 3hc~ 



fin d ini ) • (ini d fin 



(9.53) 



Free-space 
spontaneous 
emission 
rate 



Leaving a comparison of this formula with the classical theory of radiation, 24 and the exact 
evaluation of T s for a particular transition in the hydrogen atom, for reader's exercises, let me just 

2 2 2 2 2 

estimate its order of magnitude. Assuming that d ~ er B = eh lm e (e 14ns?) and fteo~ En = m e 
and taking into account the definition (6.62) of the fine structure constant a« 1/137, we get 

r 



CO 



47TS Q hc 



= a~ 



3x10" 



(9.54) 



This estimate says that the emission lines at atomic transitions are typically very sharp. With the 
present-day availability of high-speed electronics, it also makes sense to evaluate the time scale r= \IT 
of the typical quantum transition: for a typical optical frequency co~ 3xl0 15 s" , it is close to 1 ns. This is 
exactly the time constant that determines the photon counting statistics of the emitted radiation - see 



21 Here I have essentially smuggled back the sum over all electromagnetic field modes j - see Eq. (16). Since in 
the quasistationary approximation kja « 1, which is necessary for the interaction presentation by the electric 
dipole formula Eq. (24), matrix elements (49) are independent on kj, the summation is reduced to the calculation 
of the total pf m for all modes. 

22 Note the essential dependence of the rate on the geometry; the following formulas of this section are for free 3D 
space only. For a brief discussion of quantum effects in small-size, high-g resonance cavities, see Sec. 4 below. 

23 An equivalent expression was first obtained in 1930 by V. Weisskopf and E. Wigner, so that the whole 
calculation is sometimes referred to as the Weisskopf-Wigner theory. 

24 See, e.g., EM Sec. 8.2, in particular Eq. (8.28). 
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Fig. 3. Informally, one may say that this is the temporal scale of the photon spontaneously emitted by an 
atom. 25 

Note, however, that the above estimate of r is only valid for a transition with a non-vanishing 
dipole matrix element. If it equals zero (say, due to the initial and final state symmetry), the dipole 
transitions are "forbidden". (Another common related term is the transition selection rules) The 
transition may still take place due to a different, smaller interaction (say, via a magnetic dipole field of 
the atom, or its quadrupole electric field 26 ), but typically would take much longer. In some cases the 
increase of r is rather dramatic (sometimes to hours!), and explains long-term luminescence - the term 
used mostly when the initial atom excitation has a non-thermal origin, say a chemical reaction or the 
absorption of an external radiation with a higher frequency. 

Now let us consider a more general case when the electromagnetic field is initially in an arbitrary 
Fock state n, and from it may either get energy from the atomic system (photon emission) or, vice versa, 
give it back to the atom (photon absorption). For the photon emission rate, an evident generalization of 
Eq. (48) gives 



r 



>fin 



fin la * \n 



r, 



0->l 



lla^O 



2 ' 



(9.55) 



Total 
(stimulated + 
spontaneous) 
emission 
rate 



where both bra-kets may be taken in the Schrodinger picture, and T s is the spontaneous emission rate 
(53) of the same atomic system. This relation, with the account of Eq. (19), shows that at photon 
emission, the final field state "fin" has to be the Fock state with n' = n + 1, and that 



r,=(/i+i)r,. 



(9.56) 



Thus the initial field increases the photon emission rate; this effect is called the stimulated emission of 
radiation. Note that the spontaneous emission may be considered as a particular case of stimulated 
emission for n = 0, and interpreted as the emission stimulated by zero-point fluctuations of the 
electromagnetic field. 

On the other hand, in accordance with the arguments of Sec. 2, for the description of the photon 
absorption we need to replace the photon creation operator with the annihilation one, and get 



r 



fin \a \n 



la' 



According to this equation, the final state of the field at absorption is the Fock state with n 
Eq. (57) yields 27 



(9.57) 



1, and 



25 The scale cr of the spatial extension of the corresponding wave packet is surprisingly macroscopic - in the 
range of a few millimeters. Such "human" size of the emitted photons makes the optical table the key component 
of many optical experiments. 

26 See, e.g., EM Sec. 8.9. 

27 Relations (56) and (58) were conjectured, from very general arguments, by A. Einstein as early as in 1916. 
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r„ = «r 



(9.58) 



Photon 

absorption 

rate 



Results (56) and (58) are usually be formulated in terms of between the Einstein coefficients A 
and B defined in the way shown in Fig. 4, where the two energy levels are those of the atom, Y a is the 
rate of energy absorption from the electromagnetic field, and T e is that of the energy emission into the 
field. In this notation, Eqs. (56) and (58) say 



^21 ~ Bi\ 



B 



12 ' 



(9.59) 



Einstein 

coefficients' 

relation 



because each of these coefficients equals the spontaneous emission rate F s 



AE = hco 




r e = A 2i +B n n 



Fig. 9.4. Fhe Einstein coefficients on 
the atomic energy spectrum diagram. 



It is curious that from this point, there is just one step to an alternative derivation of the Bose- 
Einstein statistics for photons. Indeed, in the thermodynamic equilibrium, the average probability flows 
between levels 1 and 2 should be equal: 



W 2 {T e ) = W l {Y a ), (9.60) 

where W\ and Wi are the probabilities for the atomic system to be on the corresponding levels, so that 
Eqs. (56) and (58) yield 

W 2 T s (l + n) = W l T s (n), i.e. ^ = -XLL (9.61) 

But, on the other hand, for the atomic subsystem, only weakly coupled to its electromagnetic 
environment, we ought to have the Gibbs distribution of probabilities: 

»; = [i pH,/W (962) 

W, exp{-£, Ik J) v \ kj\ v \ kj\ 

Requiring Eqs. (61) and (62) to give the same result for the probability ratio, we get the Bose-Einstein 
distribution for the electromagnetic field in equilibrium: 

n) = l - , (9.63) 

Qxp{ha>/ k B T} -1 

the same as obtained in Sec. 7.1 by other means - see Eqs. (7.26). 

Another, very important implication of Eqs. (56) and (58) is the possibility to achieve the 
stimulated emission coherence by level occupancy (or "population") inversion. Indeed, if W% > W\, then 
the net power flow from the atomic system into the electromagnetic field, 
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power =hcoxT s \fV 2 ({n) + l)-W l (n)], (9.64) 

may be positive. The necessary inversion may be produced using several ways, notably by a intensive 
quantum transitions to level 2 from an even higher level (which, in turn, is populated, e.g., by absorption 
of an external radiation, called pumping, at a higher frequency.) 

A less obvious feature of the stimulated emission is spelled out by Eq. (55): again, it shows that 
the final state of the field after the absorption of energy hco from the atom is a pure (coherent) Fock state 
(n + 1). Colloquially, one may say that the new, (n + l) st photon emitted from the atom is automatically 
in phase with the n photons that had been in the field mode initially. 28 The idea of stimulated emission 
of coherent radiation using population inversion 29 was implemented in the early 1950s in the microwave 
range (masers) and in 1960 in the optical range (lasers). Nowadays, lasers are ubiquitous and constitute 
one of cornerstones of our technological civilization. 

A quantitative discussion of laser operation is beyond the framework of this course, and I have to 
refer the reader to special literature, 30 and would only like to mention only two key points: 

(i) In a typical laser, each generated electromagnetic field mode is in the Glauber (rather than the 
Fock) state, so that Eqs. (56) and (58) are applicable only for n is averaged over the Fock-state 
decomposition of the Glauber state - see Eq. (5.165). 

(ii) Since in a typical laser (n) » 1, its operation may be well described using quasi-classical 
theories that use Eq. (64) to describe the electromagnetic energy balance (with the addition of a term 
describing the energy loss due to field absorption in external components of the laser, including the 
useful load), plus the equation describing the balance of occupancies Wi,2 due to all inter-level 
transitions - similar to Eq. (60), but including also the contribution(s) from the particular population 
inversion mechanism used in the laser. At this approach, the role of quantum mechanics is essentially 
reduced to the calculation of parameter T s . 

The role becomes more prominent if one needs to describe fluctuations of the laser field. Here 
two approaches are possible, following the two options discussed in Chapter 7. If the fluctuations are 
relatively small, one can linearize the Heisenberg equations of motion of the field oscillator operators 
near their stationary-lasing "values", with the Langevin "forces" (also time-dependent operators) to 
describe the fluctuation sources, and use these Heisenberg-Langevin equations to the radiation 
fluctuations, just as was described in Sec. 7.5. On the other hand, near the lasing threshold the field 
fluctuations are relatively strong, smearing the phase transition between the no-lasing and lasing states. 
Here the linearization is not an option, but one can use the density-matrix approach described in Sec. 
7.6, for the fluctuation analysis. 31 



28 It is straightforward to show that this fact is also true if the field is initially in the Glauber state - which is more 
typical for lasers. 

29 This idea has been traced back at least to an obscure 1939 publication by V. Fabrikant. 

30 I can recommend, for example, P. W. Milloni and J. H. Eberly, Laser Physics, 2 nd ed., Wiley, 2010, and a less 
technical text by A. Yariv, Quantum Electronics, 3rd ed., Wiley, 1989. 

31 This path has been developed (also in the mid-1960s), by several researchers, notably including M. Sully and 
W. Lamb - see, e.g., M. Sargent III, M. Scully, and W. Lamb, Jr., Laser Physics, Westview, 1977. Note that 
while the laser radiation fluctuations may look like a peripheral issue, pioneering research in that field 
has led to the development of the general theory of open quantum systems (which was discussed in 
Chapter 7), that has much broader applications. 
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9.4. Cavity QED 

Now I have to mention, at least in passing, the cavity quantum electrodynamics (usually called 
cavity QED for short) - an art and science of creating and using entanglement between quantum states 
of a single atomic system (either an atom, or an ion, or a molecule, etc.) and the electromagnetic field in 
a macroscopic volume called the resonant cavity (or just "resonator", or just "cavity"). This field is very 
popular nowadays, especially in the context of the quantum computation and communication research 
discussed in Sec. 8. 5. 32 

Let me start its discussion by noting that the narrative of two last sections was based on an 
implicit assumption that the energy spectrum of the electromagnetic field interacting with an atomic 
system is essentially continuous. This assumption has justified the use of Golden Rule, implying that the 
emitted radiation is spread among many field modes, effectively loosing their coherence with the initial 
quantum state of the atom. However, this assumption becomes invalid if the electromagnetic field is 
contained inside a relatively small volume, with a linear size comparable with the radiation wavelength. 
Classical electrodynamics shows 33 that if the walls of such a cavity mostly reflect, rather than absorb, 
radiation, so that in the crude approximation the power dissipation may be disregarded, then particular 
solutions e/r) of the Helmholtz equation (5) correspond to discrete, well separated mode wavenumbers 
kj and hence well separated eigenfrequencies a>j. Due to the energy conservation, an atomic transition 
corresponding to energy AE = | E mi - E rm | may be effective only if the corresponding quantum 
oscillation frequency Q = AE/h is close to one of <x>j and hence relatively far from other 
eigenfrequencies. 34 As a result, the quantum states of a single atomic system and the resonant 
electromagnetic mode may become entangled and stay coherent for a long time - typically until such 
coherence is ruined by the dephasing effects - as was discussed in Chapter 7. 

A very popular approximation for the qualitative description of this effect is the so-called Rabi 
model? 5 in which the atom is treated as a two-level system 36 interacting with a single electromagnetic 
field mode of the resonant cavity. As the reader knows well from Chapters 4-6, any two-level ("spin- 1 /^") 
system may be described by Hamiltonian c • 6 , and we may always select the state basis in that the 
Hamiltonian is diagonal: 

H itom =ce z ^a z , (9.65) 

where HQ, = 2c is the energy difference between the eigenenergies in the absence of interaction with the 
field. Next, according to Eq. (17), ignoring the constant ground-state energy hco/2 (that may be added to 
the final energy in the very end - if necessary), the contribution of a single mode of eigenfrequency co to 
the Hamiltonian is 



32 This popularity was demonstrated, for example, by the 2012 Nobel Prize in Physics awarded to cavity QED 
experimentalists S. Haroche and D. Wineland. 

33 See, e.g., EM Sec. 7.9. 

34 On the contrary, if D. is far from any coj, the interaction is much suppressed; in particular, the spontaneous 
emission rate may be much lower than that given by Eq. (53) - so that this result is not as fundamental as it may 
look. 

35 After the pioneering work by I. Rabi in 1936-37. 

36 As was shown in Sec. 6.5, this model is justified, e.g., if transitions between all other energy level pairs have 
considerably different frequencies. 
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Cavity = hcotfa 



(9.66) 



Finally, according to Eq. (16a), in quantum electrodynamics the electric field of the mode may be 
presented as 



1 



1/2 



e(r) a 



V * J 



a 1 



(9.67) 



so that in the electric-dipole approximation (24), the cavity-atom interaction may be presented as a 
product of the field by one of Cartesian components (say, cr y ) of the "spin" operator: 37 



H int = const x <t x & = const xtr^x 



r X 1/2 

' nco N 



V ^ J 



a — a' ' — f 



ihica y \ d-d 



(9.68) 



Rabi 
Hamiltonian 




where /c is a coupling constant (with the dimension of frequency). The sum of these terms is called the 
Rabi Hamiltonian, 



(9.69) 



Despite its apparent simplicity, using this Hamiltonian for calculations is not that simple. For 
example, an exact quasi-analytical expression for its eigenenergies (as zeros of a Taylor series in 
parameter k, with coefficients determined by a recurrent relation) was found only recently. 38 Only in the 
case when the electromagnetic field is very intensive and hence may be treated as the classical one, the 
results following from Eq. (69) are reduced to the Rabi oscillations discussed in Sec. 6.3. 

In the opposite case when the field oscillator is in an essentially quantum state, (a^a ) ~ 1, Eq. 
(69) may be simplified in a different way, assuming that frequencies Q and a> are very close, and the 
atom-to-cavity interaction is relatively weak, so that magnitudes of the coupling constant k and the 
detuning parameter (similar to parameter A used in Sec. 6.5), 

% = n-a), (9.70) 



are both much smaller than Q « co. To discuss this limit, it is convenient to use the spin ladder operators 
defined absolutely similarly for those of the orbital angular momentum - see Eqs. (5.182): 



a. = a r + ia„ , so that cr, 



2i~ 



From Eq. (4.105), it is easy to find matrices of these operators (in the standard z-basis), 



ro 2^ 




'0 0^ 








v o o y 




v2 0 y 



and their commutation rules - that are naturally similar to Eqs. (5.183): 

[& + ,<j_] = 4a z , [a z ,d ± ] = ±2a ± . 



(9.71) 



(9.72) 



(9.73) 



37 The exact choice of this component is not important, while the formulas simplify if it is proportional to either 
pure a x or pure cy 

38 D. Braak, Phys. Rev. Lett. 107, 100401 (201 1). 
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H = — a z +Hcoa'a + — (cr + -a_i a-a ] , (9.74) 



In this notation, the Rabi Hamiltonian looks like 

-(a. -a i d-d 

2 z 2 V + ~\ 

and it is straightforward to use Eq. (4.199) and (73) to derive the Heisenberg-picture equations of 
motion for the involved operators. (Doing this, we have to remember that operators of the "spin" 
subsystem, on one hand, and of the field mode, on the other hand, are defined in different Hilbert spaces 
and hence commute - at least at coinciding time moments.) The result (so far, exact!) is 

A . /v IK i ^ , \ At . «t IK / ^ r, \ 

a = -i oxi - — [a + -a_), a ] =icoa ] = — (er + — a _ ), 

2 , . . , j , (9-75) 

cr ± = ±i£la ± + i2k\ a- d^ ]<f z , a z = iA d^ - d ](<f + + a_). 

Now note that at negligible coupling, k — > 0, equations (75) have very simple solutions, 

d(t)az e - icDt , d\t)oze icDt , a ± (t)aze ±iat , &, {f) * const , (9.76) 

and small terms proportional to k in the right-hand parts of Eqs. (75) cannot affect these time evolution 
laws dramatically even if k is not exactly zero (but small). Of those terms, ones with frequencies close 
to the "basic" frequency of each variable would act in resonance and hence may have a substantial 
impact on system dynamics, while non-resonant terms may be ignored. In this rotating-wave 
approximation (RWA), used several times before in this course, Eqs. (*) are reduced to a much simpler 
system of equations: 

; . * iK » if . „f iK » 
a = -icoa- — <r_, a ] = icoa ] -\ cr + , 

2 2 (9.77) 

<r + =iQ& + +i2Ka^&_, <r_ = -iQa -i2m& z , a z =iK\d^&_ -da + 



Alternatively, these equations of motion may be obtained from the Rabi Hamiltonian after it has 

leared c 
virtually zero: 



been cleared of the terms proportional to cr + a^ and a_d , that oscillate fast and hence self-average to 



T y HQ. „ . *f . %K{ n A . „t , i ... r , 

H = ^-o z +ncoa ] a +—\ <j + a + (j_a ] \, at tc,\g\ « a>,Ll 



(9.78) Cummings 



Janes- 
Cummi 
Hamiltonian 



This is the famous Janes-Cummings Hamiltonian, 1 ' 9 which is central to the cavity QED and its 
applications. 40 In order to find its eigenstates and eigenenergies, let us note that at negligible interaction 
(k — > 0), the spectrum of the total energy E of the system, that in this limit is the sum of two 
independent contributions from the atomic ("spin") and resonant-cavity subsystems, 



39 It was first proposed and analyzed in 1963 by two electronic engineers, E. Janes and F. Cummings, and it took 
the physics community a while to recognize and acknowledge the fundamental importance of that work. 

40 In most applications, Hamiltonian (78) is augmented by additional term(s) describing, for example, incoming 
radiation and/or coupling to environment, say due to the electromagnetic energy loss in the cavity walls - see Eq. 
(7.68). 
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m , „ fit 

= ± + hcon = E n ± — 

2 " 2 



(9.79) 



consists 41 of close level pairs (Fig. 5) centered to values 



E„ = fico 



with /i = 1,2,... (9.80) 
(At the exact resonance co = Q, i.e. at 0, each pair merges into one double-degenerate level E n .) 



E = 0 



'spin- 1/2" 



, .+ hQ/2 + heo = E,+M 

K. E 2 = 3hco/2 

a ^-hQ/2 + 2hco = E 2 -hZ 

^+nn/2 = E i +h% 

fico p = hco/2 

^-m/2 + hco = E l -h£ 

Fig. 9.5. Energy spectrum 

E = —frQ/2 of the Janes-Cummings 

cavity total svstem Hamiltonian at k« |<f |. 



Since at k — > 0 the two subsystems do not interact, the eigenstates corresponding to the sublevels 
of ft-th pair may be represented by products of their independent ket-vectors: 

|+\ = |T\®U-f> and|-^s|4A®M 



(9.81) 



Janes- 
Cummings 
eigenstates 



As we know from Chapter 6, weak interaction leads to strong hybridization of quantum states with close 
energies (in this case, two states (81) with each pair with the same n) and their negligible mixing with 
other states. Hence, at 0 < k « co ~ Q, a good approximation of an eigenstate with E « E n is given by a 
linear superposition of states (81): 





a) = c + 


t)® 


n-l) + c_ 


i)® 


n), 



(9.82) 



with certain c-number coefficients c±. This relation describes the entanglement of atomic eigenstates T 
and nI< with Fock states n and n - 1 of the field mode. 

Let me leave the (straightforward) calculation of coefficients c± and eigenenergies of the two 
entangled state pairs for reader's exercise. This calculation shows, in particular, that at the exact 
resonance (co = Q), \c+\ = \c.\ = 1N2 for both states of each pair. This fact may be interpreted as a 
(coherent!) equal sharing of an energy quantum fico = tiQ. by the atom and the cavity. 

A by-product of the calculation of c+ is the fact that the dynamics of state a described by Eq. 
(82) is similar to that of the generic two-level system that was repeatedly discussed in this course - first 
time in Sec. 2.6 and then in Chapters 4-6. In particular, if the composite system had been initially 
prepared to be in one component state, for example |T)®|0) (i.e. the atom excited, while the cavity in its 



41 Besides the non-degenerate ground state level E g = -MII2. 
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ground state) and allowed to evolve on its own, after some time interval it may be found in the 
counterpart state |^)®|1), including the first excited Fock state n = 1 of the field mode. This is one more 
(resonant) version of the same method for generation of Fock states of electromagnetic field which was 
discussed in Sec. 3. 

Unfortunately, my time devoted to cavity QED is over, and for further reading I have to refer the 
reader to special literature. 42 



9.5. The Klein-Gordon and relativistic Schrodinger equations 

Now let us discuss the basics of relativistic quantum mechanics of particles with a nonvanishing 
rest mass m - i.e., in terms of Eq. (1), the intermediate range of energies: E ~ mc 2 , i.e. for p ~ mc. 
Historically, the first attempt 43 to extend the non-relativistic wave mechanics into the relativistic energy 
range was based on performing the same transitions from classical observables to their quantum- 
mechanical operators as in the non-relativistic limit: 

p^p = -z/W, E^H = ih — . (9.83) 

dt 

The substitution of these operators, acting on the Schrodinger-picture wavefunction ^(r,^), into the 
classical relation between the energy E and momentum p (for of a free particle) leads to the following 
equations: 



Table 9.1. Deriving the Klein-Gordon equation for a free relativistic particle. 





Non-relativistic limit 


Relativistic case 


Classical 
mechanics 


E = —p 2 
2m 


172 2 2,1 2 V 

E =c p + [mc ) 


Wave 
mechanics 


if l — ^> = —(-mv) 2x V 
dt 2m 


( d V 

ih— v F = c 2 (-zW) 2v F + (mc 2 ) 2v F 

V dt) 



The resulting equation for the non-relativistic limit is just the usual Schrodinger equation (1.28) 
for a free particle. Its relativistic generalization, usually rewritten as 45 



(9.84) 




Klein- 
Gordon 
equation 



42 I can recommend, for example, either C. Gerry and P. Knight, Introductory Quantum Optics, Cambridge U. 
Press, 2005 or G. Agarwal, Quantum Optics, Cambridge U. Press, 2012. 

43 This approach was suggested almost simultaneously in 1926-1927 by (at least) V. Fock, E. Schrodinger, O. 
Klein and W. Gordon, J. Kudar, T. de Donder and F.-H. van der Dungen, and L. de Broglie. 

44 Note that in the sense of Eq. (1), in the nonrelativistic column of this table, the energy is referred to the rest 
energy mc 2 while in the relativistic column, to zero. 

45 Note that p is exactly the constant (with the reciprocal length dimensionality) that participates in Eq. (1.20b) 
describing the Compton effect for electrons. 
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is called the Klein-Gordon (or sometimes "Klein-Gordon-Fock") equation. The most fundamental 
solutions of this equation are the same plane, monochromatic waves 

Y(r,0«:exp{i(k-r-Q*}. (9.85) 

as in the non-relativistic case. Indeed, such waves are eigenstates of operators (83), with eigenvalues 

p = /zk, E = hco, (9.86) 

so that their substitution into Eq. (84) immediately returns us to Eq. (1) with replacements (86): 

E ± = ha> ± = ±^hckf +(mc 2 ) 2 ] . (9.87) 

Though one may say that this dispersion relation is just a simple combination of the classical 
relation (1) and the same basic quantum-mechanical relations (86) as in non-relativistic limit, it attracts 
our attention to the fact that energy hco as a function of momentum hk has two rather than one branches, 
with E.(\>) = -£+(p) - see Fig. 6a. Historically, this fact has played a very important role for spurring the 
fundamental idea of particle-antiparticle pairs. In this idea (very similar to the concept of electrons and 
holes in semiconductors, which was discussed in Sec. 2.8), what we call the vacuum actually 
corresponds to all states of the lower branch, with energies £-(p) < 0, being filled, while the states on 
the upper branch, with energies £+(p) > 0, being empty. Then an externally supplied energy 

AE = E + -E =E + +(-£_)> 2mc 2 >0 (9.88) 

may bring a particle from the lower branch to the upper one (Fig. 6b). The resulting excited state is 
interpreted as a combination of a particle with energy E+ and momentum p, and a "hole" (antiparticle) of 
positive energy (-E.) and momentum -p, so that in external field it behaves as a particle with an 
opposite electric charge. This fundamental idea 46 has led to a search for, and discovery (in 1932) of the 
positron - electron's antiparticle with charge q = +e, and later of the antiproton and other antiparticles. 




Fig. 9.6. (a) Free-particle 
dispersion relation resulting from 
the Klein-Gordon and Dirac 
equations, and (b) creation of a 
particle-antiparticle pair from the 
vacuum. 



Of more formal properties of the Klein-Gordon equation, it is easy to prove that its solutions 
satisfy the same continuity equation (1.52) with the probability current density j still given by Eq. 
(1.47), but a different expression for the probability density: 



46 Due to the same P. A. M. Dirac! 
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w = 



ifi 



2mc 



dt dt 



v 



(9.89) 



J 



(In the non-relativistic limit, Eq. (84) allows a reduction of this relation to Eq. (1 .22): w = x ¥ x ¥* ^) 

The Klein-Gordon equation may be readily generalized to describe a single particle moving in 
external fields; for example, the electromagnetic field effects on a particle with charge q may be 
described by the same replacement as in the non-relativistic limit (see Sec. 3.1): 47 



p -» P -qA(r,t), H -» H + q<j)(r,i). 



(9.90) 



where P = -iKV is the canonical momentum operator (3.25), and the vector- and scalar potentials, A and 
<j>, should be treated appropriately - either as c-number functions if the electromagnetic field 
quantization is unimportant, or as operators (see Sees. 1-4 above) if it is. 

However, the practical value of the relativistic Schrodinger equation is rather limited, because of 
two main reasons. First of all, it does not give the correct description of particles with spin. For example, 
for the hydrogen-like atom, i.e. the motion of an electron with electric charge -e in the Coulomb central 
field (3.182) of an immobile nucleus with charge +Ze, the equation may be solved exactly and yields the 
following spectrum of (doubly-degenerate) energy levels: 



E = mc' 



1 + 



Z 2 a 



2^ 



-1/2 



with A = n + 



1 + 



1 



-Z 2 a 2 



-1/2 













(9.91) 



where n = 1,2,... and 1 = 0, 1,..., n - 1 are the same quantum numbers as in the non-relativistic theory 
(see Sec. 3.6), and a = e lAns^fic ~ 1/137 is the fine structure constant. The three leading terms of the 
Taylor expansion of this result in small parameter Za are as follows: 



E « mc' 



1- 



zV 

In 1 



Z A a Ar 



In 



n 



1 + 1/2 



(9.92) 



The first of these terms is just the rest energy of the particle. The second term, 

mZ 2 e 4 1 Er, 



E„ = -mc 



Z 2 a 2 



2n 2 {4xsjh 2 2n 2 2n 2 
reproduces the non-relativistic Bohr's formula (3.191). Finally, the third term 



with E n = Z E 



(9.93) 



mc 



Z 4 a 4 


f n 3^ 


2E 2 


f n Z\ 


2n A 


V / + l/2 4y 


mc 2 


v / + l/2 4) 



(9.94) 



is just the kinetic -relativistic contribution (6.52) to the fine structure of the Bohr levels (93). However, 
as we already know from Sec. 6.3, for a spm- l A particle such as the electron, the spin-orbit interaction 
(6.56) gives an additional contribution of the same order to the fine structure, so that the net result, 
confirmed by experiment, is given by Eq. (6.60), i.e. different from Eq. (94). This is very natural, 
because the relativistic Schrodinger equation does not have the very notion of spin. 



After such generalization, Eq. (84) is usually called the relativistic Schrodinger equation. 
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Second, even for massive spinless particles (such as Z° bosons), for which this equation is 
believed to be valid, the most important problems are related to particle interactions at high energies of 
the order of AB ~ 2mc 2 (88) and beyond. Due to possibility of creation and annihilation of particle- 
antiparticle pairs at such energies, the number of particles participating in such interactions is typically 
considerable (and variable), and its adequate description of the system is given not by the relativistic 
Schrodinger equation (which is formulated in single -particle terms), but by the quantum field theory - to 
which I will devote just a few sentences in the very end of this chapter. 



Free- 
particle 
Hamiltonian 



9.6. Dirac 's theory 

The real breakthrough toward the quantum relativistic theory of electrons (and any spin-V2 
fermions) was achieved in 1928 by P. A. M. Dirac. For that time, the structure of his theory was highly 
nontrivial. Namely, while formally preserving, in the coordinate representation, the same Schrodinger- 
picture equation of quantum dynamics, 



ih— = m>, 

dt 



(9.95) 



it postulates that wavefunction *F is not a scalar complex function of time and coordinates, but a four- 
component column-vector (sometimes called the bispinor) of such functions, its Hermitian-conjugate 
bispinor ^ being a 4-component row-vector of their complex conjugates: 



¥ 2 (r,0 
¥ 4 (r,0 



T 



= (<(r, 



0 <(r,0 ¥*(r,0 <(r,0 



(9.96) 



and that the Hamiltonian participating in Eq. (95) is a 4x4 matrix in the Hilbert space of bispinors 
For a free particle, the postulated Hamiltonian looks extremely simple: 



H = ca p + feme' 



(9.97) 



where p = -ihV is the same 3D vector of momentum component operators as in the non-relativistic 
case, while operators a and f3 may be presented in the following shorthand 2x2 form: 48 



48 Note the amazing simplicity of the Hamiltonian (97). If the time derivative participating in Eq. (95) and the 
three coordinate derivatives participating (via the momentum operator) in Eq. (97), are merged into one 4-vector 
operator {V, dld(ci)} = dldx k , the Dirac equation (95) may be rewritten in an even simpler, manifestly Lorentz- 
invariant 4-vector form (with the implicit summation over the repeated index k = 1, 4 - see, e.g., EM Sec. 9.4): 

¥ = 0, where y = fa,y 2 ,y 3 } 



v 



8x k 



la 



0 



Y 4 =P\ 



and ju = mclh - just as in Eq. (84). Note also that Hamiltonian (97) is linear in momentum, while the non- 
relativistic Hamiltonian of a particle is quadratic in p. In my humble opinion, the Dirac theory (including the 
concept of antiparticles) is an eligible contender for the title of the most original and elegant physical idea of all 
times, despite such heavy contenders as the Newton laws, Maxwell equations, Gibbs statistical distribution, and 
Einstein's relativity. 
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Dirac 

(9.98a) operators 



Operator d , composed of the Pauli vector operators 6 , is also a vector in the usual 3D space, so 
that each of its 3 Cartesian components is a 4x4 matrix. The particular form of the 2x2 matrices 
corresponding to operators a and / in Eq. (98a) depends on the basis selected for representation of the 
spin states of the particle; for example, in the standard z-basis, in which the Cartesian components & x , 

& , and <r z of a are represented by the Pauli matrices (4.105), the full matrix form of Eq. (98a) is 



a, 



ro 


0 


0 


0 




^0 


0 


0 






^0 


0 


1 


°) 


0 


0 


1 
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> a > = 


0 


0 
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0 




0 


0 


0 


-1 


0 


1 


0 


0 




0 


— i 


0 


0 




1 


0 


0 


0 


U 


0 


0 








0 


0 






v0 


-1 


0 





1 0 



0 
0 



0 1 
0 0-1 
0 0 0 



0^ 

0 

0 

-h 



(9.98b) 



(According to the second of Eq. (98a), P has this form in any spin basis.) It is straightforward to use Eqs. 
(98) to verify that matrices and P satisfy the following relations: 

a] = a 2 y = a 2 z =p 2 =1, (9.99) 
a x a y + a y a x = a y a z + a z a y = a z a x + a x a z = a v p + pa v = a v p + pa v = a 2 p + pa. = 0 , (9.100) 
i.e. antic ommute. 

Acting essentially as in Sec. 4.1, but using commutation relations (99)-(100), it is 
straightforward to show that any solution to the Dirac equation obeys the probability conservation law, 
i.e. the continuity equation (1.52), with the probability density, 



and the probability current, 



w = v F tv P 



j = ¥ t (cd) v P, 



(9.101) 



(9.102) 



looking almost as in the non-relativistic theory - cf. Eqs. (1.22) and (1.47). Note, however, the 
Hermitian conjugation used in these formulas instead of the complex conjugation, in order to form 
scalars w,j x ,j y , and j z from 4-component vectors (96). 

This qualified similarity is extended to the fundamental, plane-wave solutions of the Dirac 
equations is free space. Indeed, plugging such solution, in the form 



v P = ue 



i (k-r-eot) 



(9.103) 



\U A j 



into Eqs. (95) and (97), we get a system of 4 coupled, linear algebraic equations for 4 complex c-number 
amplitudes u 1,2,3,4- The condition of their consistency yields the same dispersion relation (87), i.e. the 
same two-branch diagram shown in Fig. 6, as follows from the Klein-Gordon equation. The difference is 
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Spin 
operator 
in Dirac's 
theory 



that plugging each value of co, given by Eq. (87), back into the system of equations for amplitudes u, we 
get two solutions for vector u for each of the energy branches. In the standard spin z-basis they may be 
presented as: 



for E = > 0 : 



u +t =c + 



for E = E < 0 : 



( 1 




( 0 ^ 


0 




1 






C{P X ~iPy) 


E + + mc 2 


11 — / ' 


E + + mc 2 


C(P X +iPy) 




-cp 7 


v E + + mc 2 




E + + mc 2 


f ) 




'c(p x -ip y y 


E - mc 2 




E - mc 2 


c(p x + ip v ) 


> u I = c i 


~CP Z 


E_ - mc 2 


17 2 

h — mc 


1 




0 


v 0 , 




. 1 , 



(9.104a) 



(9.104b) 



where c are normalization coefficients. 

The simplest interpretation of these solutions is that Eq. (103) with vectors u+, given by Eq. 
(104a), represents a spin-l/2 particle (say, an electron), while that with vectors u. given by Eq. (104b) 
represents an antiparticle (a positron), and two solutions for each particle correspond to two opposite 
directions of spin, cr z = +1, S z = ±h/2. This interpretation is indeed solid in the nonrelativistic limit, when 
two last components of vector (104a) and two first components of vector (104b) are negligibly small: 





















0 




1 




0 




0 


u +t ^ 


0 




0 


, u t H> 


1 




0 



















at 



mc 



(9.105) 



In order to show this, let us use the Dirac equation to calculate the Heisenberg-picture law of 
time evolution of operators of the Cartesian components of the orbital angular momentum L = rxp, for 
example of L x = yp z - zp y , taking into account the fact that operators (98a) commute with those of r and 
p, and also the Heisenberg commutation relations (2.14): 



ih 



3^ 

dt 



L X ,H 



= ca ■ [(yp z - zp y \j>] = -ihc{d z p y - a y p z ), 



(9.106) 



with similar relations for two other Cartesian components of the operator. Since the right-hand part of 
these equations is different from zero, the orbital momentum is generally not conserved - even for a free 
particle! Let us, however, consider the following vector operator, 



(9.107a) 




whose Cartesian components, in the z-basis, are represented by 4x4 matrices 
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-i 
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s = 



10 0 0 
0-100 
0 0 10 
0 0 0 -1 



(9.107b) 



and calculate the Heisenberg-picture law of time evolution of these components, for example 

ih^- = [s x , h] = c[s x , (d x p x + d yPy + d zPz )] = ihc(d z p y - d yPz ). 
A direct calculation of the commutators of matrices (98) and (107) yields 



? x ,a x ] = 0, [s s ,d y ]=ihd z , [s x ,d z \ = 



-itid. 



so that we finally get 



iti 



8S X 
~dt 



= ihc\a z p y -a pj, 



(9.108) 



(9.109) 



(9.110) 



with similar expressions for other two components of the operator. Comparing this result with Eq. (106), 
we see that any Cartesian component of operator (5.198), 



J = L + S 



(9.111) 



is an integral of motion, 49 so that this operator may be interpreted as the one presenting the total angular 
momentum. Hence, operator (104) may be interpreted as the spin operator of a spin-Vi particle (e.g., 
electron). As it follows from the last of Eq. (107b), columns (105) represent the eigenkets of the z- 
component of that operator, with eigenstates S z = ±h/2, depending on the arrow index. So, the Dirac 
theory provides a justification for spm- l A - or, somewhat more humbly, replaces the spin hypothesis by 
an assumption of a simpler (and hence more plausible), Lorentz-invariant Hamiltonian (97). 

Note, however, that this fact is not true for the exact solutions (103)-(104), so that generally the 
eigenstates of the Dirac Hamiltonian are certain linear (coherent) superpositions of component 
wavefunctions describing the particle and its antiparticle - each with both directions of spin. This fact 
leads to several interesting effects, including the so-called Klien paradox at reflection of a particle from 
a tunnel barrier. 50 It is curious that some of these effects may be reproduced in such non-relativistic 
systems as electron moving in a 2D honeycomb lattice (e.g., in graphene), since they also feature a 
(locally) linear dispersion relation - see Eq. (3. 122). 51 



9.7. Low-energy limit 

The generalization of the Dirac 's theory to the case of a particle with electric charge q, moving 
in a classically-described electromagnetic field may be obtained using the same Eqs. (90). As a result, 
Eq. (95) becomes 



49 It is straightforward to show that this result remains valid for a particle in the field of central potential U(r). 

50 See, e.g., A. Calogeracos andN. Dombey, Contemp. Phys. 40, 313 (1999). 

51 For a review see, e.g., T. Robinson, Am. J. Phys. 80, 141 (2012). 
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d • c(- z7zV - qA) + feme 2 + (q0 - h) 



¥ = 0. 



(9.112) 



where the Hamiltonian operator H is understood in the sense of Eq. (95), i.e. as the partial time 
derivative with multiplier ih. Let us prepare this equation for a low-energy approximation by acting on 
its left-hand part by a similar square bracket (also an operator!), but with the opposite sign before the 
last parentheses. Using relations (99) and (100), and the fact that space- and time-independent operators 

a and/? commute with the spin-independent functions A(r, t) and <ft\r,t), as well as with the 
Hamiltonian operator iWldt, the result is 

{: 2 [a • (- ihV - qA)Y + (mc 2 J - c[a ■ (- i»V - qA), (q<p - h)]- (qj -Hj} x ¥ = 0. (9. 113) 

A direct calculation of the first square bracket, using Eqs. (98) and (107), yields 

[d ■ (- m - qA)] 2 = (- ihV - qAf - 2qS ■ V x A . (9.114) 

But according to the last of Eqs. (3.21), the last vector product in the right-hand part is just the magnetic 
field 



3=VxA. 

Similarly, we may use the first of Eqs. (3.21), for the electric field, 
to simplify the commutator participating in Eq. (9.1 13): 



(9.115) 



(9.116) 



[d -(-z'/zV - qA\(g0- H )]= -qa ■ \h, A]-z7z<?d • [V,^] = -ihq— iha • = itiqa ■ & . (9.117) 



As a result, Eq. (110) becomes 

{: 2 (- ihV - qAf + (q0 - fif - (mc 2 J - 2qc 2 S ■ 3 + ihcqa ■ = 0 . 



(9.118) 



So far, this is an exact result, equivalent to Eq. (112), but more convenient for an analysis of the 
low-energy limit in that not only the offset energy E - mc 2 (which is the energy used in non-relativistic 
quantum mechanics), but also the electrostatic energy of the particle, are much smaller than the 

rest energy mc . In this limit, the second and third terms of Eq. (118) almost cancel, and introducing the 
offset Hamiltonian 



H = H-mc 2 I. 

we may approximate their difference, up to the first nonvanishing term, as 

(qji -h) 2 -(mc 2 ) 2 I = {q<fi -mc 2 I -h] -(mc 2 ) 2 1 « 2mc 2 { H - q<fi 



As a result, after division of all terms by 2mc , Eq. (118) may be approximated as 



Low- 
energy 
Hamiltonian 




(9.119) 



(9.120) 



(9.121) 
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Let us discuss this important result. The first two terms in the square brackets give the 
Hamiltonian (3.26) that was extensively used in Chapter 3 for the discussion of non-re lativistic motion 
of charged particles. Note again that the contribution of the vector-potential A into that Hamiltonian is 
essentially relativistic, in the following sense: when used for the description of magnetic interaction of 
two charged particles, due to their orbital motion with speed v « c, the magnetic interaction is a factor 
of (v/c) 2 smaller than the electrostatic interaction of the particles. 52 The reason why we did discuss the 
effects of A in Chapter 3 was that is was used there to describe external magnetic fields, keeping our 
analysis valid even for the cases when that field is strong by being produced by relativistic effects - such 
as aligned spins in a permanent magnet. 

The next, third term in the square brackets is also familiar to the reader: it was introduced 
informally in Sec. 4.1, and then formally in Sec. 4.4 to describe the effect of magnetic field on particle's 
spin - see Eqs. (4.3), (4.5), and (4.163). When justifying this form of interaction, I referred mostly to 
results of Stern-Gerlach-type experiments, but it is extremely pleasing that this result 53 follows from 
such a fundamental relativistic treatment as Dirac's theory. As we already know from the discussion of 
the Zeeman effect in Sec. 6.4, the effects of magnetic field on the orbital motion of an electron 
(described by orbital angular momentum L) and its spin S are of the same order, i.e. present an 
essentially relativistic effect. 

Finally, the last term in the square brackets of Eq. (121) is also not quite new for us: in particular 
it describes the spin-orbit interaction. Indeed, in the case of classical, spherical-symmetric electric field 
£■ with potential tj> (r) = U{r)lq, the term may be reduced to Eq. (6.56b): 



2m c r dr 2m c r 



(Q 199^ Spin-orbit 
(y.lZZ) coupling 



The proof of this correspondence requires a bit of additional work, 54 because in Eq. (121), the term 
responsible for the spin-orbit interaction acts on 4-component wavefunctions, while Hamiltonian (122) 
is supposed to act on non-relativistic wavefunctions with account of spin, whose coordinate 
representation is given by 2-component columns - spinors: 55 



52 This difference may be traced even by classical means - see, e.g., EM Sec. 5.1. 

53 With the g-factor still equal to exactly 2 - see Eq. (4.116) and its discussion. In order to describe the small 
deviation of g e from 2, the electromagnetic field should be quantized (just as this was done in Sees. 1-4), and its 
potentials A and </>, participating in Eq. (112) should be treated as operators - rather than as c-number functions as 
was assumed above. The calculation of this deviation is one of the basic problems of quantum field theory. Other 
important effects of electromagnetic interactions, described by the theory, are the so-called Lamb shift and 
hyperfine structure of atomic levels - see the end of this chapter for references. 

54 The only facts immediately evident from Eq. (121) are that the term we are discussing is proportional to the 
electric field, as required by Eq. (122), and that it is of the proper order of magnitude. Indeed, Eqs. (101)-(102) 
imply that in the Dirac theory, ca plays the role of the velocity operator, so that the expectation values of the term 
are of the order of %qv£l2mc 2 . Since the expectation values of the operators participating in Hamiltonian (122) 

scale as S ~ h/2 and L ~ mvr, the spin-orbit interaction energy has the same order of magnitude. 

55 As a reminder, in this course the notion of spinor was introduced earlier for two-particle states - see Eq. (8.14). 
For a single particle, that definition is reduced to ^r)|s), whose representation in a particular spin-!/2 basis is a 
column similar to Eq. (123). Also note that spinors (123) may be expanded into a series over the spin-orbitals 
(8.117) discussed in Sec. 8.3, with index j used for numbering both the two directions of spin (i.e. two 
components of spinor's column) and orbital eigenfunctions. 
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V 



^1, 



(9.123) 



The simplest way to prove the identity of the two formulas is not to use Eq. (121) directly, but to 
return to the Dirac equation (112), for the particular case of motion in a stationary electric field with no 
magnetic field, when Dirac 's Hamiltonian is reduced to 

H = ca-p + fimc 2 + U(r). (9.124) 

Since this Hamiltonian is time-independent, we may look for its 4-component eigenfunctions in the form 



exp -i — t 

y h j 



(9.125) 



where each of y/+ is a 2-component column of the type (123), representing two spin states of the particle 
(index +) and antiparticle (index -). Plugging Eq. (125) into Eq. (124), and using Eq. (98a), we get the 
following system of two linear equations: 



E-mc 2 -U(r) 
E + mc 2 -U(r) 



y/ + — co • p y/_ = 0, 
y/_ -co -py/ + = 0. 



(9.126) 



Expressing y/. from the latter equation, and plugging the result into the former one, we get the following 
single equation for particle's spinor: 



E -mc 



U(r)-c 2 a-v 



1 



E + mc 2 - U(r) 



op 



(9.127) 



So far, this is an exact equation for eigenstates and eigenvalues of Hamiltonian (124). It may be 
substantially simplified in the low-energy limit when both the potential energy 56 and the non-relativistic 
eigenenergy 

E = E-mc 2 (9.128) 
are much less than mc 2 . Indeed, in this case the expression in denominator of the last term in the 

2 2 

brackets of Eq. (127) is close to 2mc . Since a = 1, with that replacement, Eq. (127) is reduced to the 
non-relativistic Schrodinger equation, similar for both spin components of y/+, and hence giving spin- 
degenerate energy levels. In order to recover small relativistic and spin-orbit effects, we need a slightly 
more accurate approximation: 



1 



1 



1 



E + mc 2 - U(r) 2mc 2 +E- U(r) 2mc 2 
in which Eq. (127) is reduced to 



1 + 



E-U(r) 



2mc z 



-i 



1 



2mc 



1- 



E-U(r) 



2mc 



(9.129) 



£ tt( \ P " , - ~E-U(r) „ „ 
E-U{r)-^— + o-p 7 -yo-p 

2w \2mc 2 ) 



(9.130) 



56 Strictly speaking, this requirement is imposed on the expectation values of U{r) in the eigenstates to be found. 
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As follows from Eqs. (5.46)-(5.47), the operators of momentum and of a function of coordinates 
commute as 



\p,u(r)] = -mvu, 

so that the last term in square brackets of Eq. (130) may be rewritten as 

E-U(r)^ iti 



op 



(imc) 



U(r), , 
J o p 



(imc) 2 (imc) 2 



(e-vc/Xe-p). 



(9.131) 



(9.132) 



Since in the low-energy limit both terms in the right-hand part of this relation are much smaller 
than the three leading terms of Eq. (130), in the first of them we may replace the nominator with its non- 
relativistic value p 2 1 1m . With this replacement, the term coincides with the first relativistic correction 
to the kinetic energy operator - see Eqs. (6.47) and (6.49a). The second term, proportional to the electric 
field & = -Vtfi = -VUlq, may be transformed further on, using a readily verifiable relation 

(6 ■ VU\6 ■ p) = (yu)- p + ic ■ [{VU)x p] . (9.133) 

Of the two terms in the right-hand part, only the second one depends on spin, 57 giving the following 
spin-orbital interaction contribution to the Hamiltonian, 

h 



H m = 



*-[(Vf/)xp] = -4 T S-[(V^)xp]. 



(imc)' 



1m c 



(9.134) 



For a central electric field with <f(r) = the potential gradient has only one, radial component: V^ = 
(d(j)ldr)rlx = - fair, and with the angular momentum definition Lsrxp, Eq. (134) is reduced to Eq. 
(122). 

As was shown in Sec. 6.3, the perturbative treatment of Eq. (122), together with the kinetic- 
relativistic correction (6.49), in the hydrogen-like atom problem, leads to the fine structure of each Bohr 
level E n , given by Eq. (6.60): 



1E„ 



mc 



4n 



y + 1/2 



(9.135) 



This result gets a confirmation from the surprising fact that for the hydrogen-like atom problem, the 
Dirac equation may be solved exactly - without any assumptions. I do not have time/space to reproduce 
the solution, 58 and will list just the final result for the energy spectrum: 




Z 2 a 2 



n + {u + \/l) 2 -Z 2 a 2 f 2 -(j + \ll) 



-1/2 



H-like 

(9.136) atom's 

eigenenergies 



Here n = 1,2, ... is the same main quantum number as in Bohr's theory, while j is the quantum number 
specifying eigenvalues (5.203) of the total angular momentum's square J 2 in the units of fi 2 , taking half- 



57 The first term gives a small, spin-independent shift of the energy spectrum, which is very difficult to verify 
experimentally. 

58 Good descriptions of the solution are available in many textbooks (the older the better :-), for example see Sec. 
53 in L. I. Schiff, Quantum Mechanics, 3 rd ed., McGraw-Hill (1968). 
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integer values: j = I ± l A = 1/2, 3/2, 5/2, ... - see Eq. (5.215). Such set of quantum numbers is rather 
natural, because due to the spin-orbit interaction, the orbital and spin angular momenta are not 
conserved, while their vector sum, J = L + S, is - in the absence of external magnetic field. Each energy 
level (136) is doubly-degenerate, with two eigenstates representing two directions of spin - i.e. two 
values of / =j + l A at fixed j. 

Since according to Eq. (1.9), the square of the fine-structure constant a = e 1 l4ns$iz may be 

2 2 2 

presented as the ratio Eu/mc , the low-energy limit (E -mc ~ Eh « mc ) may be pursued by expanding 

2 

Eq. (136) into the Taylor series in (Za) « 1 . The result, 



mc 



zV 

In 1 



Z a 



In 



3 



7 + 1/2 4 



(9.137) 



has the same structure, and allows the same interpretation as Eq. (92), but with the last term coinciding 
with Eq. (6.52) - and with experimental results. Historically, this correct description of the fine structure 
of atomic levels provided a decisive proof of Dirac's theory. 

However, even such an impressive theory does not have too many direct applications. The main 
reason for that was already discussed in brief in the end of Sec. 5: due to the possibility of creation and 
annihilation of particle-antiparticle pairs at energies higher than 2mc , the number of particles 
participating in high-energy interactions is not fixed. An adequate general description of such situation 
is given by the quantum field theory, in which both the electromagnetic field and wavefunction *F are 
treated as interacting fields to be quantized - very much as the electromagnetic field was treated in Sees. 
1-4 above, and also with many parallels with the second quantization formalism outlined in Sec. 8.3. 
(The Dirac equation follows from the quantum field theory in the single-particle approximation.) As was 
mentioned above on several occasions, the quantum field theory is beyond the scope of this course, and 
I have to stop here, referring the interested reader to one of several excellent available textbooks on this 
discipline. 59 



9.8. Exercise problems 
9.1 . Prove the Casimir formula (23) for the attraction force F 



-PA between two perfectly 



conducting parallel plates of area A, separated by a narrow vacuum gap d«A 
Hint: You may like to use the Euler-Maclaurin formula. 6 ® 



1/2 



9.2 . Calculate the zero-delay value g^ 2 \0) of the second-order correlation function of a single- 
mode electromagnetic field in the so-called Schrddinger-cat state: a coherent superposition of two 
Glauber states, with equal amplitudes, equal but sign-opposite parameters a, and a certain phase shift 
between them. 



59 For a gradual introduction see, e.g., either L. Brown, Quantum Field Theory, Cambridge U. Press (1994) or R. 
Klauber, Student Friendly Quantum Field Theory, Sandtrove (2013); on the other hand, M. Srednicki, Quantum 
Field Theory, Cambridge U. Press (2007) and A. Zee, Quantum Field Theory in a Nutshell, 2 nd ed., Princeton 
(2010), among many others, offer a steeper learning curve. 

60 See, e.g.,MAEq. (2.12). 
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9.3 . Calculate the zero-delay value g ( '(0) of the second-order correlation function of single- 
mode electromagnetic field in the squeezed ground state * defined by Eq. (5.172) of the lecture notes. 

9.4 . Calculate the rate of spontaneous photon emission (into the unrestricted free space) by a 
hydrogen atom, initially in the 2p state (n = 2, 1 = 1) with m = 0. Would the result be different for m = ± 
1? for the 2s state (n = 2, I = 0, m = 0)? Discuss the relation between these quantum-mechanical results 
and those given by the classical theory of radiation. 

9.5 . Find the eigenstates and eigenvalues of the Janes-Cummings Hamiltonian (78), and discuss 
their behavior near the resonance point a> = CI. 
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Chapter 10. Making Sense of Quantum Mechanics 

This (very cryptic) chapter outlines the issues of quantum mechanics interpretation, that are still a 
subject of debate - fortunately not affecting practical applications of the quantum theory. 



Only now, with a quantitative understanding of the principles of quantum mechanics, we are 
ready to proceed to the discussion of its interpretation 1 - the issue which is very closely related to 
problems of measurements, already discussed in Sec. 7.7. As was already mentioned in that section, the 
founding fathers of quantum mechanics have not left much guidance on these topics, because in the first 
years after the advent of this exciting new theory they gave understandable preference to using it for 
deriving new particular results, and then were much distracted by the development of nuclear physics 
and its applications. This is why, after a very important but inconclusive discussion between A. Einstein 
and N. Bohr in the mid-30s, the debates of quantum measurements and the related conceptual issues of 
quantum mechanics have resumed only in the 1950s. They have led to a key contribution by J. Bell in 
the early 1960s, and an important experimental work on verifying Bell's inequalities (see below), but 
besides that work, the recent progress is marginal, and opinions of even prominent physicists on certain 
issues are still very much different. 

Perhaps the central controversial issue is question (iii) posed in Sec. 7.7: what (if any :-) is the 
"real" state of a quantum-mechanical system before a nearly-perfect measurement giving a certain 
outcome? In order to be specific, let us focus again on the simplest example of Stern-Gerlach 
measurements of spin-Vi particles - because of their physical transparency and technical simplicity. 2 As 
the reader knows very well by now, even in a pure quantum spin state (for example, T), i.e. the least 
uncertain state of the system, the results of the Stern-Gerlach measurements of other spin component 
are still uncertain. Indeed, as we know from Sec. 4.4, the ket-vector of this state may be presented as 



so that the probabilities of measuring any of values S x = +HI2 and S x = -ft/2 equal 50%. So, has the spin 
had a certain value of S x a split second before the Stern-Gerlach measurement that gave a certain 
outcome, for example S x = +ft/21 For a classical system, with perfect detectors, the answer is definitely 
yes. In this case, the pre-measurement probability of 50% just reflects the degree of our ignorance about 
the real state of the system, and the detector merely reveals it. 

However, the situation in quantum mechanics is different, and such interpretation is impossible, 
as was clearly shown in the famous EPR paper published in 1935 by A. Einstein, B. Podolsky, and N. 



1 I believe that another popular name for this group of issues, "foundations of quantum mechanics", is hardly 
appropriate. The only reliable foundation of physics (or any other genuine scientific discipline) is a set of 
experimental facts. 

2 As was discussed in Sec. 7.7, Stern-Gerlach-type experiments may be readily made almost "perfect", i.e. 
virtually unaffected by instrument imperfections, provided that we do not care about the state of the particle after 
a single-shot measurement. 
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Rosen. Its original discussed thought experiments with a pair of ID particles prepared in a quantum 
state in that both the sum of their momenta and difference of their coordinates are exactly fixed: p\ + P2 
= 0, xi - X2 = a. 3 However, usually the discussion is recast into an equivalent Stern-Gerlach experiment 
shown in Fig. la. A source emits rare pairs of spin-Vi particles, propagating in opposite directions, with 
exactly zero net spin, but otherwise in random spin states. After the spatial separation of the particles 
has become sufficiently large (see below), the spin state of each of them is measured with a Stern- 
Gerlach detector, one of them (Fig. 1, detector SGi) somewhat closer to the particle source, so it makes 
the measurement first, at time t\ < t 2 . 



(a) 



particle pair 




Stern-Gerlach detectors 
on both sides 




Fig. 10. 1. (a) General scheme 
of two-particle Stern-Gerlach 
experiments, and (b) the 
orientation of the detectors, 
assumed at the devivation of 
Bell's inequality (14). 



First, let the detectors be oriented say along the same direction, say axis z. Evidently, the 
probability of each detector to give any of values S z = ±h/2 is 50%. However, if the first detector had 
given result S z = -h/2, even before the second detector's measurement, we know that it will give result S z 
= +h/2 with 100% probability. So far, the result allows for a classical interpretation, just for the single- 
particle measurements discussed in Sees. 2.5 and 7.7. Thus we may fancy that the second particle really 
has a definite spin before the measurement, and the first measurement has just removes our ignorance 
about that reality. In other words, the change of probability is due to the statistical ensemble 
redefinition: the 50% probability belongs to the ensemble of all experiments, while the 100% 
probability, to the sub-ensemble of experiments with the S z = -h/2 outcome of the first experiment. 

However, let the source generate the particle pairs in the entangled, singlet state (8.19), 



(10.2) 



that certainly satisfies the above assumptions: the probability of each S z value of any particle is 50%, the 
sum of both S z is exactly zero, and if the first detector's result is S z = -h/2, then the state of the remaining 
particle is t, with zero uncertainty. Now let us use Eq. (1), and its counterpart for vector |nI<), 4 to present 
the same initial state (2) in the form 



>■ 



<- 



<- 



-»>+<- 



» 



(10.3) 



3 This is possible, because the corresponding operators commute: \p x — p 2 , x\ + x 2 ] = \p l , x x ]— [p 2 , x 2 ] = 0 . 

4 As a reminder, it differs from Eq. (1) only by the sign in the parentheses - see, e.g., Eqs. (4.123). 
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Opening the parentheses (without swapping the ket-vector order!), we get an expression similar to Eq. 
(2), but now for the jc-basis: 

| Jl2 ) = -jL(|-><-)-|<— >)). (10.4) 

Hence if we use the first detector (closest to the particle source) to measure S x rather than S z , then after it 
had given as certain result (say, S x = -fill), we know for sure, before the second particle spin's 
measurement, that its S x component equals +h/2. 

So, depending on the experiment performed on the first particle, the second particle turns out to 
be in one of two states - either with a definite component S z or with a definite component S x , in each 
case without any uncertainty. Evidently, this situation cannot be interpreted in classical terms if the 
particles do not interact during the measurements. A. Einstein in was deeply unhappy with such 
situation, because it did not satisfy the general requirement to any theory, which nowadays is called the 
local reality. His definition of this requirement was as follows: 

"The real factual situation of system 2 is independent of what is done with system 1 that is 
spatially separated from the former" . 

(Here the term "separated" in this sentence is a bit uncertain, but from the context it is clear that Einstein 
meant the detector separation by a superluminal interval, i.e. by distance 

|r, -r 2 | > c\t x -t 2 \, (10.5) 

where the measurement time difference, participating in the right-hand part, includes the measurement 
duration.) In Einstein's view, since quantum mechanics does not satisfy the local reality condition, it 
cannot be considered a complete theory of Nature. 

This situation naturally raises the question whether something (usually called hidden variables) 
may be added to the quantum-mechanical description in order to satisfy the local reality requirement. 
The first definite statement in this regards was J. von Neumann's "proof 5 (first famous, then infamous 
:-) that such variables cannot be introduced; for a while his work satisfied quantum mechanics 
practitioners. 6 A major new contribution to the problem was made only in the 1960s by J. Bell. 7 First of 
all, he has found an elementary (in his words, "foolish") error in von Neumann's logic, which voids his 
"proof. Second, he demonstrated that Einstein's local reality condition is incompatible with 
conclusions of quantum mechanics - that had been, by that time, confirmed by too many experiments to 
be seriously questioned. Since no hidden variable introduction can change this situation, in this sense 
such introduction is impossible. 

Let me describe a particular version of Bell's proof (suggested by E. Wigner), using the same 
EPR pair experiment (Fig. la), in that each SG detector may be oriented in any of 3 directions: a, b, or c 
- see Fig. lb. As we know from Chapter 4, if a fully-polarized beam of spin-Vi particles is passed 



5 In his pioneering book I. von Neumann, Mathematische Grundlagen der Quantenmechanik [Mathematical 
Foundations of Quantum Mechanics], Springer, 1932. (The first English translation was published only in 1955.) 

6 Evidently, it would not satisfy A. Einstein, but reportedly he did not know about von Neumann's result before 
signing the EPR paper. 

7 See, e. g., I. S. Bell, Rev. Mod. Phys. 38, 447 (1966), or J. S. Bell, Foundations of Physics 12, 158 (1982). 
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through a Stern-Gerlach apparatus forming angle tj> with the polarization axis, the probabilities of two 
counterpart outcomes of the experiment are 



W(^ + ) = cos 2 -^, W((0 = sin 2 -^. 



(10.6) 



Quantum- 
mechanical 
result for 
probabilities 



Let us use this formula to calculate all joint probabilities of measurement outcomes, starting 
from the detectors 1 and 2 oriented, respectively, in directions a and c. Since the angle between negative 
direction of axis a and positive direction of axis c is <fi a+iC .= n - (p (see the dashed arrow in Fig. lb), we 
get 



1 



W(a + ,c + )=W(a + )W(c + \a + ) =-cos J 



2 n — a> 1 . 2 (P 
■ = — cos — = — sin — 



1 

2 



Absolutely similarly, 



W(c + ,b + ) = W(c + )W(b + \c + ) = | sin 2 |, 



W(a + , b + ) = W(a + )W{b + 1 a + ) = -cos 2 — 1(p - 1 - - : 



1 

2 



= — sin m. 
2 



Now note that for any angle cp smaller than 7tl2 (as in the case shown in Fig. lb), 

1. 2 1 ■ 2 <P 1 ■ 1<P ■ 2<P 
— sin m > — sin — l- — sin — s sin — . 

2 2 2 2 2 2 



(10.7) 

(10.8) 
(10.9) 

(10.10) 



(For example, for cp — > 0 the left-hand part of this relation tends to cp 12, while the right-hand part, to 
q> /4.) Hence the quantum-mechanical result gives, in particular, 



W(a + ,b + ) > W(a + , c + ) + W(c + ,b + ), for U<tt/2. 



(10.11) 



On the other hand, we may compose another inequality for the same probabilities without 
calculating them from any particular theory, but using the local reality assumption. Let us list all 
possible outcomes of detector measurements, taking into account the zero net spin: 



Quantum- 
mechanical 
result 
for joint 
probabilities 



W(a + ,b + ) 



W{a + ,c + ) _ 


i ► 




w 

i — ► 






' — ► 


W(c + ,b + ) 


► 

► 



Detector 1 
results 


Detector 2 
results 


Probability 


&+i b+, c + 


a., b., c. 


W x 


a+, b+, c. 


a., b., c+ 


W 2 


a+, b., c+ 


a., b+, c. 


w 3 


a+, b., c. 


a., b + , c+ 


w 4 


a., b+, c+ 


a+, b., c. 


w 5 


a., b+, c. 


a+, b., c+ 


W 6 


a., b., c+ 


a+, b+, c. 


W 7 


a., b., c. 


b+, c+ 





From the local reality point of view, these measurement options are independent, so we may 



write: 
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Bell's 
inequality for 
local-reality 
theories 



W(a + ,c + )=W 2 +W 4 , W(c + ,b + ) = W 3 +W 7 , W(a + ,b + )=W i +W 4 . (10.12) 
On the other hand, since no probability may be negative (by its very definition), we may always write 

W 3 + W 4 <{W 2 + W 4 ) + (W 3 +W 7 ). (10.13) 
Plugging into this inequality the values of these two parentheses, given by Eq. (12), we get 

W(a + ,b + )<W(a + ,cJ + W(c + ,b + ). (10.14) 



This is (one of several possible forms of) the Bell's inequality that has to be satisfied by any local-reality 
theory; it directly contradicts the quantum-mechanical result (11). 

Though experimental tests of the Bell's inequalities had been started in the late 1960s, the 
interpretation of first results was vulnerable to two criticisms: 

(i) The detectors were not fast enough and not far enough to have relation (5) satisfied. This is 
why, as the matter of principle, there was a chance that information on one measurement had been 
transferred (by some, mostly implausible) means to particles before the second measurement - the so- 
called locality loophole. 

(ii) Particle detection efficiencies were too low to have sufficiently small error bars for both parts 
of the inequality - the detection loophole. 

Gradually, these loopholes have been closed. 8 As expected, substantial violations of Bell 
inequalities equivalent to Eq. (14) have been proved, essentially rejecting any possibility to reconcile 
quantum mechanics with Einstein's local reality requirement. 



10.2. Interpretations of quantum mechanics 

The fact that quantum mechanics is incompatible with local reality, makes it reconciliation with 
our (classically -bred) "common sense" rather challenging. Here is a brief list of the major interpretations 
of quantum mechanics, that try to provide at least a partial reconciliation of this kind: 

(i) The so-called Copenhagen interpretation, to which most physicists subscribe. This 
"interpretation" does not really interpret anything; it just states the internal randomness of measurement 
results in quantum mechanics, essentially saying: "Do not worry; this is just how it is; live with it". For 
me personally, this interpretation, at least in its most frequently repeated forms, has only one, rather 
pedagogical weakness: though it implies statistical ensembles (otherwise, how would you define the 
probability?), but does not put a sufficient emphasis on their role, in particular the possible ensemble 
redefinition as the only key point of human involvement in the measurement process. 9 Perhaps the most 
impressive objection to the Copenhagen interpretation was given by A. Einstein during his 1935 



8 Important milestones on that way were experiments by A. Aspect et at, Phys. Rev. Lett. 49, 91 (1982) and M. 
Rowe et al, Nature 409, 791 (2001). A detailed review of the experimental situation was given, for example, by 
M. Genovese, Phys. Repts. 413, 319 (2005); see also more recent experiments by D. Matsukevich et al., Phys. 
Rev. Lett. 100, 150404 (2008) and D. Salart et al, Nature 454, 861 (2008). Presently, a low-noise demonstration 
of the Bell inequality violation has become a standard test in each experiment with entangled qubits used for 
quantum encryption research - see Sec. 8.5. 

9 A detailed discussion of statistical ensemble's role may be found, e.g., in L. Balentine, Quantum Mechanics, 
World Scientific, 1998. 
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discussion with N. Bohr: "God does not play dice." OK, when Einstein speaks, we all should listen, but 
perhaps when God speaks (through the experimental results), we have to pay even more attention. 

(ii) Non-local reality. After the dismissal of von Neumann's "proof by J. Bell, to the best of my 
knowledge, there has been no proof that hidden parameters could not be introduced, provided that they 
do not imply the local reality. Of constructive approaches, perhaps the most notable contribution was 
made by D. Bohm 10 who developed the L. de Broglie's interpretation of the wave function as a "pilot 
wave", making it quantitative. In the wave mechanics version of this concept, the wavefunction, 
governed by the Schrodinger equation, just guides a real, point-like classical particle whose coordinates 
serve as hidden variables. However, this concept does not satisfy the notion of local reality. Namely, the 
measurement of particle's coordinate at a certain point ri has to instantly change the wavefunction 
everywhere, including points r 2 in the superluminal interval range (4). So, Bohm's hidden variables 
would hardly make A. Einstein happy. After having recognized this problem, D. Bohm abandoned his 
theory - in J. Bell's view, perhaps too early. In my personal taste, however, the assumption of such (in 
Einstein's words) "spooky action at a distance" is too large a sacrifice to save the classical determinism. 

(iii) The many-world interpretation introduced in 1957 by H. Everitt and popularized in the 
1960s and 1970s by B. de Witt. In this interpretation, all possible measurement outcomes do happen, 
splitting the Universe into the corresponding number of "parallel" Universes, so that from one of them, 
other Universes and hence other outcomes cannot be observed. Let me leave to the reader an estimate of 
the rate at which the parallel Universes being constantly generated (say, per second), taking into account 
that such generation should take place not only at explicit lab experiments, but at any irreversible 
process such as fission of any atom nucleus or absorption of a photon, everywhere in each Universe - 
whether its result is recorded or not. Even the main proponent of this interpretation, B. de Witt, has 
confessed: "The idea is not easy to reconcile with common sense". I agree. 

(iv) The quantum logic. In desperation, some physicists turned philosophers have decided to 
dismiss the very logic we are using - in science and elsewhere, so that a statement like "the Bell 
inequalities are violated" would not make any definite sense. OK, if we dismiss the formal logic, I do 
not know how we can use any scientific theory and make any predictions - until the quantum logic 
experts tell us what to replace the classical logic with. To the best of my knowledge, so far they have not 
done that, at least for the measurement process. I personally trust J. Bell's opinion: "It is my impression 
that the whole vast subject of Quantum Logic has arisen [. . .] from the misuse of a word." 

The weakness of all interpretations of quantum mechanics is that, as far as I know, neither of 
them has yet provided any suggestion how this particular interpretation might be tested experimentally 
to exclude other ones. On the positive side, there is a consensus that quantum mechanics makes correct, 
if sometimes probabilistic, predictions of all reliable experimental results we are aware of. Maybe, this 
is not that bad for a scientific theory. 11 



1() D. Bohm, Phys. Rev. 85, 165; 180 (1952). 

11 If the reader is not satisfied with this "positivistic" approach, and wants to improve the situation, my earnest 
advice would be to start not from square one, but from reading what other (including some very clever!) people 
thought about it. A good starting point is the review collection by J. Wheeler and W. Zurek (eds.), Quantum 
Theory and Measurement, Princeton U. Press, 1983. 
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Chapter 1. Review of Thermodynamics 

This chapter starts from a brief discussion of the subject of statistical physics and thermodynamics, and 
the relation between these two disciplines. Then I proceed to a review of the basic notions and relations 
of thermodynamics. Most of this material is supposed to be known to the reader from his or her 
undergraduate studies, 1 so the discussion is rather brief. 

1.1. Introduction: Statistical physics and thermodynamics 

Statistical physics (alternatively called "statistical mechanics") and thermodynamics are two 
different approaches to the same goal: a description of internal dynamics of large physical systems, 
notably those consisting of many, N » \, identical particles - or other components. The traditional 
example of such a system is a human-scale portion of a gas, with the number N of molecules of the order 
of the Avogadro number Na ~ 10 . 2 The "internal dynamics" is an (admittedly loose) term meaning all 
the physics unrelated to the motion of the system as a whole. The most important example of the internal 
dynamics is the thermal motion of atoms and molecules. 

The motivation for the statistical approach to such systems is straightforward: even if the laws 
governing the dynamics of each particle and their interactions were exactly known, and we had infinite 
computing resources at our disposal, calculating the exact evolution of the system in time would be 
impossible, at least because it is completely impracticable to measure the exact initial state each 
component, e.g., the initial position and velocity of each particle. The situation is further exacerbated by 
the phenomena of chaos and turbulence, 3 and the quantum-mechanical uncertainty, 4 which do not allow 
the exact calculation of final positions and velocities of the component particles even if their initial state 
is known with the best possible precision. As a result, in most situations only statistical predictions 
about behavior of such systems may be made, with the probability theory becoming a major part of the 
mathematical tool arsenal. 

However, the statistical approach is not as bad as it may look. Indeed, it is almost self-evident 
that any measurable macroscopic variable characterizing a stationary system of Af » 1 particles as a 
whole (think, e.g., about pressure P of a gas contained in a fixed volume V) is almost constant in time. 
Indeed, we will see below that, besides certain exotic exceptions, the relative fluctuations - either in 
time, or among macroscopically similar systems - of such a variable are of the order of \Nn, i.e. for N ~ 
A^a are extremely small. As a result, the average values of macroscopic variables may characterize the 
state of the system rather well. Their calculation is the main task of statistical physics. (Though the 
analysis of fluctuations is also an important task, but due to the fluctuation smallness, the analysis in 
most cases may be based on perturbative approaches - see Chapter 5.) 



1 For remedial reading, I can recommend, for example (in the alphabetical order): C. Kittel and H. Kroemer, 
Thermal Physics, 2 nd ed., W. H. Freeman (1980); F. Reif, Fundamentals of Statistical and Thermal Physics, 
Waveland (2008); D. V. Schroeder, Introduction to Thermal Physics, Addison Wesley (1999). 

2 See, e.g., Sec. 4 below. (Note that in these notes, the chapter number is dropped in references to figures, 
formulas, and sections within the same chapter.) 

3 See, e.g., CM Chapters 8 and 9. (Acronyms CM, EM, and QM refer to other of my lecture note series.) 

4 See, e.g., QM Chapter 1. 
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Let us have a look at typical macroscopic variables the statistical physics and thermodynamics 
should operate with. Since I have already mentioned pressure P and volume V, let me start with this 
famous pair. First of all, note that volume is an extensive variable, i.e. a variable whose value for a 
system consisting of several noninteracting (or weakly interacting) parts is the sum of those of its parts. 
On the other hand, pressure is an example of intensive variables whose value is the same for different 
parts of a system - if they are in equilibrium. In order to understand why P and V form a natural pair of 
variables, let us consider the classical playground of thermodynamics, a portion of a gas contained in a 
cylinder, closed with a movable piston of area A (Fig. 1). Neglecting friction between the walls and the 
piston, and assuming that it is being moved slowly enough (so that the pressure P, at any instant, is 
virtually the same for all parts of the volume), the elementary work of the external force f = -PA, 
compressing the gas, at a small piston displacement dx, is 



did = fdx = 



f 



(Adx) = -PdV 



(1.1) 



Work 
on a gas 



It is clear that the last expression is more general than the model shown in Fig. 1, and does not depend 
on the particular shape of the system surface. 5 



^SSSSSSSMS Fi § 11 Compressing a gas 



From the point of analytical mechanics, 6 V and (-P) is just one of many possible canonical pairs 
of generalized coordinates qj and generalized forces /J, whose products d~Uj= -fjdqj give contribution to 
the total work of the environment on the system under analysis. For example, the reader familiar with 
the basics of electromagnetism knows that the elementary work of an electric field on a unit volume of a 
media is 7 



drt = #-d& = Y j £ j d® j 



(1.2) 



7 = 1 



so that the role of generalized coordinates is played by Cartesian components of the electric 
displacement while the components of the electric field & serve as the corresponding generalized 
forces. Similarly, the elementary work of the magnetic field /Vis 8 



5 In order to prove that, it is sufficient to integrate the scalar product d ^ = df-dr, with df = -Pnd 2 r, where dr is 
the surface displacement vector (see, e.g., CM Sec. 7. 1), and n is the outer normal, over the surface. 

6 See, e.g., CM Chapters 2 and 10. 

7 See, e.g., EM Eq. (3.82). 

8 See, e.g., EM Eq. (5.128). Note that Eqs. (2)-(3) are in SI units (used throughout this lecture series). In the 
Gaussian units, the right-hand parts of these relations have additional coefficients \/4k. 
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3 

drt = tf-d3 = J^# j d3 j , (1.3) 

7=1 

where 3 is the magnetic induction. This list may be extended to other interactions (such as gravitation, 
surface tension in fluids, etc.). Following tradition, I will use the {-P , V } pair in almost all the formulas 
below, as well in most instances but the reader should remember that they all are valid for any other 
pair {/f,qj}. 

Again, the specific relations between the variables of each pair listed above are typically affected 
by the statistics of the components (particles) of a body, but their definition is not based on statistics. 
The situation is very different for a very specific pair of variables, temperature T and entropy S, despite 
the fact that these "sister variables" participate in many formulas of thermodynamics exactly like one 
more canonical pair {fj, qj}. However, the very existence of these two variables is due to statistics. 
Temperature T is an intensive variable that characterizes the degree of thermal "agitation" of system 
components. On the contrary, entropy S is an extensive variable that in most cases evades immediate 
perception by human senses; it is a qualitative measure of disorder of the system, i.e. the degree of our 
ignorance about its exact microscopic state. 9 

The reason for the appearance of the {T, S} pair of variables in formulas is that the statistical 
approach to large systems of particles brings some qualitatively new results, most notably the notion of 
irreversible time evolution of collective (macroscopic) variables describing the system. On one hand, 
such irreversibility looks absolutely natural in such phenomena as the diffusion of an ink drop in a glass 
of water. In the beginning, the ink molecules are located in a certain small part of system's volume, i.e. 
to some extent ordered, while at the late stages of diffusion, the position of each molecule is essentially 
random. However, as a second thought, the irreversibility is rather surprising, 10 taking into account that 
it takes place even if the laws governing the motion of system's components are time-reversible - such 
as the Newton laws or the basic laws of quantum mechanics. Indeed, if, at a late stage of the diffusion 
process, we exactly reversed the velocities of all molecules simultaneously, the ink molecules would 
again gather (for a moment) into the original spot. 11 The problem is that getting the information 
necessary for the exact velocity reversal is not practicable. This example shows a deep connection 
between the statistical mechanics and the information theory. 

A qualitative discussion of the reversibility-irreversibility dilemma requires a strict definition of 
the basic notion of statistical mechanics (and indeed the probability theory), the statistical ensemble, and 
I would like to postpone it until the beginning of Chapter 2. In particular, in that chapter we will see that 
the basic law of irreversible behavior is the increase of entropy S in any closed system. Thus, statistical 
mechanics, without defying the "microscopic" laws governing evolution of system's components, 



9 The notion of entropy was introduced into thermodynamics in the 1850s by R. Clausius, on the background of 
an earlier pioneering work by S. Carnot (see Sec. 7 below), as a variable related to "useful thermal energy" rather 
than a measure of disorder. In the absence of any clue of its microscopic origins (which had to wait for decades 
until the works by L. Boltzmann and J. Maxwell), this was an amazing intellectual achievement. 

10 Indeed, as recently as in the late XIX century, the very possibility of irreversible macroscopic behavior of 
microscopically reversible systems was questioned by some serious scientists, notably by J. Loschmidt in 1876. 

11 While quantum-mechanical effects, with their intrinsic uncertainty, are quantitatively important in this example, 
our qualitative discussion does not depend on them. A good example is the chaotic, but classical motion of a 
billiard ball on a 2D Sinai table - see CM Fig. 9.8. 
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introduces on top of them some new "macroscopic" laws, intrinsically related to the evolution of 
information, i.e. the degree of our knowledge of the microscopic state of the system. 

To conclude this brief discussion of variables, let me mention that as in all fields of physics, a 
very special role in statistical mechanics is played by energy E. In order to emphasize the commitment 
to disregard the motion of the system as a whole, in thermodynamics it is frequently called the internal 
energy, though for brevity, I will mostly skip the adjective. Its simplest example is the kinetic energy of 
the thermal motion of molecules in a dilute gas, but in general E also includes not only the individual 
energies of all system's components, but also their interactions with each other. Besides a few 
pathological cases of very-long-range interactions (such as the Coulomb interactions in plasma with 
uncompensated charge density), the interactions may be treated as local; in this case the internal energy 
is proportional to N, i.e. is an extensive variable. As will be shown below, other extensive variables with 
the dimension of energy are often very useful, including the (Helmholtz) free energy F, the Gibbs 
energy G, enthalpy H, and grand potential Q. (The collective name for such variables is thermodynamic 
potentials.) 

Now, we are ready for a brief discussion of the relation between statistical physics and 
thermodynamics. While the task of statistical physics is to calculate the macroscopic variables discussed 
above, 12 using this or that particular microscopic model of the system, the main role of thermodynamics 
is to derive some general relations between the average values of the macroscopic variables (called 
thermodynamic variables) that do not depend on specific models. Surprisingly, it is possible to 
accomplish such a feat using a few either evident or very plausible general assumptions (sometimes 
called the laws of thermodynamics), which find their proof in statistical physics. 13 Such general relations 
allow us to reduce rather substantially the amount of calculations we have to do in statistical physics; in 
many cases it is sufficient to calculate from statistics just one or two variables, and then use 
thermodynamic relations to calculate all other properties of interest. Thus the thermodynamics, 
sometimes snubbed at as a phenomenology, deserves every respect not only as a discipline which is, in a 
certain sense, more general than statistical physics as such, but also as a very useful science. This is why 
the balance of this chapter is devoted to a brief review of thermodynamics. 



1.2. The 2 nd law of thermodynamics, entropy, and temperature 

Thermodynamics accepts a phenomenological approach to entropy S, postulating that there is 
such a unique extensive measure of disorder, and that in a closed system, 14 it may only grow in time, 
reaching its constant (maximum) value at equilibrium: 15 



dS>0. 



(1.4) 



2" a law of 
thermo- 
dynamics 



This postulate is called the 2 nd law of thermodynamics - arguably its only substantial new law. 



12 Several other quantities, for example the heat capacity C, may be obtained as partial derivatives of the basic 
variables discussed above. Also, at certain conditions, the number of particles N in the system is not fixed and 
may be also considered as an (extensive) variable. 

13 Admittedly, some of these proofs are based on other (but deeper) postulates, for example the central statistical 
hypothesis - see Sec. 2.2. 

14 Defined as a system completely isolated from the environment, i.e. the system with its internal energy fixed. 

15 Implicitly, this statement also postulates the existence, in a closed system, of thermodynamic equilibrium, an 
asymptotically reached state in which all thermodynamic variables, including entropy, remain constant. 
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Surprisingly, this law, together with the additivity of S in composite systems of non-interacting 
parts (as an extensive variable), is sufficient for a formal definition of temperature, and a derivation of 
its basic properties that comply with our everyday notion of this variable. Indeed, let us consider a 
particular case: a closed system consisting of two fixed-volume subsystems (Fig. 2) whose internal 
relaxation is very fast in comparison with the rate of the thermal flow (i.e. the energy and entropy 
exchange) between the parts. In this case, on the latter time scale, each part is always in some quasi- 
equilibrium state, which may be described by a unique relation E(S) between its energy and entropy. 16 





H 


F K 





Fig. 1.2. Composite thermodynamic system. 



Neglecting the interaction energy between the parts (which is always possible aiN» 1 , in the 
absence of long-range interactions), we may use the extensive character of variables E and S to write 

E = E 1 (S 1 )+E 2 (S 2 ), S = S X +S 2 , (1.5) 

for the full energy and entropy of the system. Now let us calculate the following derivative: 

dS 2 d(E-E x ) 



dS 



dS, 



dS dS x dS 2 

= — L + — - 

dE x dE x dE x dE x dE 2 dE x dE x dE 2 dE x 



■ + - 



(1.6) 



Since the total energy E of the system is fixed and hence independent of its re-distribution 
between the sub-systems, dEldE\ =0, and we get 



dS dS, dS 0 



dE. dE, dE, 



(1.7) 



According to the 2 nd law of thermodynamics, when the two parts reach the thermodynamic equilibrium, 
the total entropy S reaches its maximum, so that dSldE\ = 0, and Eq. (7) yields 

dS x dS 2 



dE x dE 2 



(1.8) 



Definition of 
temperature 



Thus we see that if a thermodynamic system may be partitioned into weakly interacting 
macroscopic parts, their derivatives dS/dE should be equal in the equilibrium. The reciprocal of such 
derivative is called temperature. Taking into account that our analysis pertains to the situation (Fig. 2) 
when both volumes are fixed, we may write this definition as 



(1.9) 




16 Here we strongly depend on a very important (and possibly the least intuitive) aspect of the 2 nd law, namely that 
the entropy is the unique measure of disorder, i.e. its only measure which may affect the system's energy, or any 
other thermodynamic variable. 
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subscript V meaning that volume is kept constant at differentiation. (Such notation is common and very 
useful in thermodynamics, with its broad range of variables.) 

Note that according to Eq. (9), if temperature is measured in energy units (as I will do in this 
course for the brevity of notation), S is dimensionless. The transfer to the SI or Gaussian units, i.e. to 
temperature 7k measured in kelvins (not "degrees Kelvin", please!), is given by relation 7 = &b7k, where 
the Boltzmann constant fc B ~ 1.38x10" J/K = 1.38x10" erg/K. 17 In these units, the entropy becomes 
dimensional: 5k = ksS. 

The definition of temperature, given by Eq. (9), is of course in a sharp contract with the popular 
notion of 7 as a measure of the average energy per particle. However, as we will repeatedly see below, 
is most cases these two notions may be reconciled. In particular, let us list some properties of 7, which 
are in accordance with our everyday notion of temperature: 

(i) according to Eq. (9), temperature is an intensive variable (since both E and S are extensive), 
i.e., in a system of similar particles, independent of the particle number N; 

(ii) temperatures of all parts of a system are equal at equilibrium - see Eq. (8); 

(iii) in a closed system whose parts are not in equilibrium, thermal energy (heat) always flows 
from a warmer part (with higher 7) to the colder part. 

In order to prove the last property, let us come back to the closed, composite system shown in 
Fig. 2, and consider another derivative: 

dS dS, dS, dS, dE, dS, dE, _ „^ 

— = — - + — - = — ! - + — -. (1-10) 

dt dt dt dE x dt dE 2 dt 

If the internal state of each part is very close to equilibrium (as was assumed from the very beginning) at 
each moment of time, we can use Eq. (9) to replace derivatives dS\^dE\^ for l/7i >2 and get 

dS 1 dE, 1 dE 2 

— = - + -. (1-11) 

dt T x dt 7 2 dt 

Since in a closed system E = E\ + E 2 = const, these time derivatives are related as dE 2 /dt = -dEJdt, and 
Eq. (11) yields 

(\ 1 V 



dS_ 

dt 



7 T 



^- (1.12) 
dt 



But in accordance with the 2 nd law of thermodynamics, the derivative cannot be negative: dSldt > 0. 
Hence, 



1 



T T 



dE 

^>0. (1.13) 
dt 



17 For more exact value of this and other constants, see appendix CA: Selected Physical Constants. Note that both 
T and T K define the absolute (also called "thermodynamic") scale of temperature, in contrast to such artificial 
temperature scales as degrees Celsius ("centigrades"), defined as T c = T K + 273.15, or degrees Fahrenheit: T f = 
(9/5)r c + 32. 
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For example, if T\ > T 2 (i.e. l/T\ < 1/7V), then dE\ldt < 0, i.e. the warmer part gives energy to its colder 
counterpart. 

Note also that at such a heat exchange, at fixed volumes V\ t 2, and T\ ^ T 2 , increases the total 
system entropy, without performing any "useful" mechanical work. 



1.3. The 1 st and 3 rd laws of thermodynamics, and heat capacity 

Now let us consider a thermally insulated system whose volume V may be changed by a 
deterministic force - see, for example, Fig. 1. Such system is different from the fully closed one, 
because its energy E may be changed by the external force's work - see Eq. (1): 

dE = drt = -PdV . (1.14) 

Let the volume change be so slow (dV/dt — > 0) that the system is virtually at equilibrium at any 
instant without much error. Such a slow process is called reversible, and in this particular case of a 
thermally insulated system, it is also called adiabatic. If pressure P (or any a generalized external force 
fj) is deterministic, i.e. is a predetermined function of time independent on the state of the system under 
analysis, it may be considered as coming from a fully ordered system, i.e. the one having zero entropy, 
with the total system completely closed. Since according to the second of Eqs. (5), the entropy of the 
total closed system should stay constant, S of the system under analysis should stay constant on its own. 
Thus we arrive at a very important conclusion: an adiabatic process, the entropy of a system cannot 
change. 18 This means that we can use Eq. (14) to write 



P = 



dE 
dV 



(1.15) 



Js 



Let us now consider an even more general thermodynamic system that may also exchange 
thermal energy ("heat") with the environment (Fig. 3). 



drt 




Fig. 1.3. General thermodynamic process 
involving both the mechanical work and heat 
exchange with the environment. 



For such a system, our previous conclusion about the entropy constancy is not valid, so that S, in 
equilibrium, may be a function of not only energy E, but also of volume V. Let us resolve this relation 
for energy: E = E(S, V), and write the general mathematical expression for the full differential of E as a 
function of these two independent arguments: 



dE = 



dE 

as 



dS + 



Jv 



dE 
8V 



dV. 



(1.16) 



Js 



A general (not necessarily adiabatic) process conserving entropy is sometimes called isentropic. 
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This formula, based on the stationary relation E = E(S, V), is evidently valid not only in 
equilibrium, but also for very slow, reversible 19 processes. Now, using Eqs. (9) and (15), we may rewrite 
Eq. (16) as 



dE = TdS - PdV 



(1.17) 



Energy: 
differential 



The second term in the right-hand part of this equation is just the work of the external force, so that due 
to the conservation of energy, 20 the first term has to be equal to the heat dQ transferred from the 
environment to the system (see Fig. 3): 



dE = dQ + drt, 
dQ = TdS . 



(1.18) 
(1.19) 



1 s ' law of 
thermo- 
dynamics 



The last relation, divided by T and then integrated along an arbitrary (but reversible!) process 

dQ 



+ const, 



(1.20) 



is sometimes used as an alternative definition of entropy S - provided that temperature is defined not by 
Eq. (9), but in some independent way. It is useful to recognize that entropy (like energy) may be defined 
to an arbitrary constant, which does not affect any other thermodynamic observables. The common 
convention is to take 



Oatr^O. 



(1.21) 



This condition is sometimes called the 3 rd law of thermodynamics, but it is important to realize that this 
is just a convention rather than a real law. 21 Indeed, the convention corresponds well to the notion of the 
full order at T = 0 in some systems (e.g., perfect crystals), but creates ambiguity for other systems, e.g., 
amorphous solids (like the usual glasses) that may remain, for "astronomic" times, highly disordered 
even at T — > 0. 

Now let us discuss the notion of heat capacity that, by definition, is the ratio dQIdT, where dQ is 
the amount of heat that should be given to a system to raise its temperature by a small amount dT. 22 
(This notion is very important, because it may be most readily measured experimentally.) The heat 
capacity depends, naturally, on whether the heat dQ goes only into an increase of the internal energy dE 



19 Let me emphasize that an adiabatic process is reversible, but not vice versa. 

20 Such conservation, expressed by Eqs. (18)-(19), is sometimes called the V law of thermodynamics. While it (in 
contrast with the 2 nd law) does not present any new law of nature on the top of mechanics, and in particular was 
already used de-facto to write the first of Eqs. (5) and Eq. (14), such grand name was quite justified in the mid- 
19* century when the mechanical nature of the internal energy (thermal motion) was not at all clear. In this 
context, the names of two great scientists, J. von Mayer (who was first to conjecture the conservation of the sum 
of the thermal and macroscopic mechanical energies in 1841), and J. Joule (who proved the conservation 
experimentally two years later), have to be reverently mentioned. 

21 Actually, the 3 rd law (also called the Nernst theorem) as postulated by W. Nernst in 1912 was different - and 
really meaningful: "It is impossible for any procedure to lead to the isotherm T= 0 in a finite number of steps." I 
will discuss this postulate in the end of Sec. 6. 

22 By this definition, the full heat capacity of a system is an extensive variable. The capacity per either unit mass 
or per particle (i.e., an intensive variable), is called the specific heat capacity or just the specific heat. Note, 
however, that in some texts, the last term is used for the heat capacity of the system as the whole as well, so that 
some caution is in order. 
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of the system (as it does if volume V is constant), or also into the mechanical work (-d^tf that may be 
performed at expansion - as it happens, for example, if pressure P, rather than volume V, is fixed (the so- 
called isobaric process - see Fig. 4). 23 



dQ /V* 




Fig. 1.4. Fhe simplest implementation of 
an isobaric process. 



Hence we should discuss two different quantities, the heat capacity at fixed volume, 



capacity and heat capacity at fixed pressure 




K dTj 


V 


(1.22) 










definitions 




K dTj 


5 

P 


(1.23) 



and expect that for all "normal" (mechanically stable) systems, C P > Cy. The difference between Cp and 
Cy is rather minor for most liquids and solids, but may be very substantial for gases - see Sec. 4. 



1.4. Thermodynamic potentials 

A technical disadvantage of Eqs. (22) and (23) is that dQ is not a differential of a function of 
state of the system, 24 and hence (in contrast with temperature and pressure) does not allow an immediate 
calculation of heat capacity, even if the relation between E, S, and V is known. For Cy the situation is 
immediately correctable, because at fixed volume, d-W = -PdV= 0 and hence, according to Eq. (18), dQ 
= dE. Hence we may write 



C 



dT 



(1.24) 



Jv 



23 A similar duality is possible for other pairs {qj, /f} of generalized coordinates and forces as well. For example, 
if a long sample of a dielectric placed is into a parallel, uniform external electric field, value of field <? is fixed, i.e. 
does not depend on sample's polarization. However, if a thin sheet of such material is perpendicular to the field, 
then value of field D is fixed - see, e.g., EM Sec. 3.4. 

24 The same is true for work 1*), and in some textbooks this fact is emphasized by using a special sign for 
differentials of these variables. I do not do this in my notes, because both did and dQ are still very much usual 
differentials: for example, d^ is the difference between the mechanical work which has been done over our 
system by the end of the infinitesimal interval we are considering, and that done by the beginning of that interval. 
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so that in order to calculate CV from a certain statistical-physics model, we only need to calculate E as a 
function of temperature and volume. 

If we want to write similarly a convenient expression for Cp, the best way is to introduce a new 
notion of so-called thermodynamic potentials - whose introduction and effective use is perhaps one of 
the most impressive formalisms of thermodynamics. For that, let us combine Eqs. (1) and (18) to write 
the "1 st law of thermodynamics" in its most common form 



dQ = dE + PdV. 

At an isobaric process (Fig. 5), i.e. at P = const, this expression is equivalent to 

(dQ) P =dE + d(PV) = d(E + PV) P . 
Thus, if we introduce a new function with the dimensionality of energy: 



H = E + PV, 



(1.25) 



(1.26) 



(1.27) 



Enthalpy: 
definition 



called enthalpy (or, more rarely, the "heat function" or "heat contents"), 25 we may rewrite Eq. (23) as 



dH 
dT 



(1.28) 



Jp 



Comparing Eq. (28) with (24) we see that for the heat capacity, enthalpy H plays the same role at fixed 
pressure as the internal energy E plays at fixed volume. 

Now let us explore properties of the enthalpy for an arbitrary reversible process, i.e. lifting the 
restriction P = const, but still keeping definition (27). Differentiating it, we get 

dH =dE + PdV + VdP, (1.29) 
so that plugging in Eq. (17) for dE, we see that two terms PdV cancel, yielding a very simple expression 



dH = TdS + VdP. 



(1.30) 



Enthalpy: 
differential 



This equation shows that if H has been found (say, experimentally measured or calculated for a certain 
microscopic model) as a function of entropy S and pressure P, we can find temperature T and volume V 
by simple partial differentiation: 



T = 



dH 
~8S 



V = 



Jp 



dH 
BP 



(1.31) 



Js 



The comparison of the first of these relations with Eq. (9) shows that not only for the heat capacity, but 
for temperature as well, enthalpy plays the same role at fixed pressure as played by the intrinsic energy 
at fixed volume. Moreover, the comparison of the second of Eqs. (31) with Eq. (15) shows that the 
transfer between E to H corresponds to a simple swap of (-P) and V in the expressions for the 
differentials of these variables. 

This success immediately raises the question whether we could develop it further on, by defining 
other useful thermodynamic potentials - variables with the dimensionality of energy that would have 



25 This function (as well as the Gibbs free energy G, see below), had been introduced in 1875 by J. Gibbs, though 
the term "enthalpy" was coined much later by H. Onnes. 
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similar properties, first of all a potential which would enable a similar swap of T and S in its full 
differential. We already know that the adiabatic processes is the reversible process with fixed entropy, 
so that now we should analyze a reversible process with fixed temperature. Such isothermal process 
may be implemented, for example, by placing the system under consideration into a thermal contact 
with a much larger system (called either the heat bath, or "heat reservoir", or "thermostat") that remains 
in thermodynamic equilibrium at all times - see Fig. 5. 



heat bath 



dQ 



Fig. 1.5. Fhe simplest 
implementation of an 
isothermal process. 



Due to its large size, the heat bath temperature T does not depend on what is being done with our 
system, and if the change is being done slowly enough (i.e. reversibly), that temperature is also the 
temperature of our system - see Eq. (8) and its discussion. Let us calculate the elementary work d^ for 

such a reversible isothermal process. According to the general Eq. (18), d-W = dE dQ. Plugging in dQ 
from Eq. (19), for T= const we get 

(drt) T =dE-TdS = d(E-TS) = dF, (1.32) 



where the following combination, 



Free 
energy: 
definition 



F = E-TS 



(1.33) 



is called the free energy (or the "Helmholtz free energy", or just the "Helmholtz energy" 26 ). Just as we 
have done for the enthalpy, let us establish properties of this new thermodynamic potential for an 
arbitrary (not necessarily isothermal) small reversible variation of variables, while keeping definition 
(33). Differentiating this relation and using Eq. (17), we get 



Free 
energy: 
differential 



dF = -SdT - PdV. 



Thus, if we know function F(T, V), we can calculate S and P by simple differentiation: 



S = 



fdF^ 






, P = ~ 


{dTj 


V 


ydV) 



(1.34) 



(1.35) 



It is easy to see that we can make the derivative system full and symmetric if we introduce one 
more thermodynamic potential. Indeed, we have shown that each of three already introduced 
thermodynamic potentials (E, H, and F) has especially simple full differential if it is considered a 



26 After H. von Helmholtz (1821-1894). The last term was recommended by the most recent (1988) IUPAC's 
decision, but I will use the first term, which prevails is physics literature. Its origin may stems from Eq. (32): F is 
may be interpreted as the internal energy part which is "free" to be transferred to mechanical work - at a 
reversible, isothermal process only! 
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function of two canonical arguments: one of "thermal variables" (either S or T) and one of "mechanical 
variables" (either P or V): 27 

E = E(S,V), H=H(S,P), and F = F(T,V). (1.36) 

In this list of pair of 4 arguments, only one pair is missing: (T, P). The thermodynamic function of this 
pair, which gives two other variables (S and V) by simple differentiation, is called the Gibbs energy (or 
sometimes the "Gibbs free energy"): G = G(T, P). The way to define it in a symmetric way is evident 
from the so-called circular diagram shown in Fig. 6. 




Fig. 1.6. (a) Circular diagram and 
(b) its use for variable calculation. 
The thermodynamic potentials are 
shown in boldface, each flanked by 
its two canonical arguments. 



In this diagram, each thermodynamic potential is placed between its two canonical arguments - 
see Eq. (36). The left two arrows in Fig. 6a show the way the potentials H and F have been obtained 
from energy E - see Eqs. (27) and (33). This diagram hints that G has to be defined as shown by the 
right two arrows on that panel, i.e. as 



G = E -TS + PV = H -TS = F + PV . 



(1.37) 



In order to verify this idea, let us calculate the full differential of this new potential, using, e.g., the last 
form of Eq. (37) together with Eq. (32): 



dG = dF + d(PV) = (-SdT - PdV) + (PdV + VdP) = -SdT + VdP, 



so that if we know the function G(T, P), we can indeed readily calculate entropy and volume: 



S = 



8G_ 
dT 



V = 



J p 



dG_ 
BP 



(1.38) 



(1.39) 



Gibbs 

energy: 

definition 



Gibbs 

energy: 

differential 



Jt 



The circular diagram completed in this way is a good mnemonic tool for describing Eqs. (9), 
(15), (31), (35), and (39), which express thermodynamic variables as partial derivatives of the 
thermodynamic potentials. Indeed, the variable in any corner of the diagram may be found as a 
derivative of any of two potentials that are not its immediate neighbors, over the variable in the opposite 
corner. For example, the red line in Fig. 6b corresponds to the second of Eqs. (39), while the blue line, 
to the second of Eqs. (31). At this, the derivatives giving variables of the upper row (S and P) have to be 



27 Note the similarity of this situation with that is analytical (classical) mechanics (see, e.g., CM Chapters 2 and 
10): the Lagrangian function may be used to get simple equations of motion if it is expressed as a function of 
generalized coordinates and velocities, while is order to use the Hamiltonian function in a similar way, it has to be 
expressed as a function of the generalized coordinates and momenta. 
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taken with negative signs, while those giving the variables of the bottom row (V and 7), with positive 
signs. 28 

Now I have to justify the collective name "thermodynamic potentials" used for E, H, F, and G. 
For that, let us consider an irreversible process, for example, a direct thermal contact of two bodies with 
different initial temperatures. As we have seen in Sec. 2, at such a process, the entropy may grow even 
without the external heat flow: dS > 0 at dQ = 0 - see Eq. (12). For a more general process with dQ * 0, 
this means that entropy may grow faster than predicted by Eq. (19), which has been derived for a 
reversible process, so that 

dS>^, (1.40) 

with the equality approached in the reversible limit. Plugging Eq. (40) into Eq. (18) (which, being just 
the energy conservation law, remains valid for irreversible processes as well), we get 

dE<TdS-PdV. (1.41) 

Now we can use this relation to have a look at the behavior of other thermodynamic potentials in 
irreversible situations, still keeping their definitions given by Eqs. (27), (33), and (37). Let us start from 
the (very common) case when both temperature T and volume V are kept constant. If the process was 
reversible, according to Eq. (34), the full time derivative of free energy F would equal zero. Equation 
(41) says that at the irreversible process it is not necessarily so: if dT= dV =0, then 

^ = ± ( E-TS) = ^-T^<T^-T^ = 0. (1.42) 
dt dt dt dt dt dt 

Hence, in the general (irreversible) situation, function F can only decrease, but not increase in time. This 
means that F eventually approaches its minimum value F(T, S), which is given by the equations of 
reversible thermodynamics. 

Thus in the case T = const, V = const, the free energy F, i.e. the difference E - TS, plays the role 
of the potential energy in the classical mechanics of dissipative processes: its minimum corresponds to 
the (in the case of F, thermodynamic) equilibrium of the system. This is one of the key results of 
thermodynamics, and I invite the reader to give it some thought. One of its possible handwaving 
interpretations is that the heat bath with fixed T > 0, i.e. with a substantial thermal agitation of its 
components, "wants" to impose thermal disorder in the system immersed in it by "rewarding" it (by 
lowering its F) for any increase of disorder. 

Repeating the calculation for the case T = const, P = const, it is easy to see that in this case the 
same role is played by the Gibbs energy: 

«-^(*-7S + M0-— r^ + P^S(T— P^)-r^ + P— ft (143) 
dt dt dt dt dt dt dt dt dt 



28 There is also a wealth of other relations between thermodynamic variables that may be presented as second 
derivatives of the thermodynamic potentials, including four Maxwell relations such as (8S/dV) T = (dP/dT) v , etc. 
(They may be readily recovered from the well-known property of a function of two independent arguments, say, 
f(x, y): d(df/dx)/dy = d(df/dy)/dx.) In this chapter, I will list only the thermodynamic relations that will be used 
later in the course; a more complete list may found, e.g., in Sec. 16 of the textbook by L. Landau and E. Lifshitz, 
Statistical Physics, Part 1, 3 rd ed., Pergamon, 1980 (and later its re-printings). 
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so that the thermal equilibrium now corresponds to the minimum of G rather than F. One can argue very 
convincingly that the difference, G - F = PV between these two potentials (also equal to H - E) has very 
little to do with thermodynamics at all, because this difference exists (although not much advertised) in 
classical mechanics as well. 29 Indeed, the difference may be generalized as G - F = - ffqj, where qj is 
any generalized coordinate and ?f is the corresponding generalized force - see Eq. (1) and its discussion. 
In this case the minimum of F corresponds to the equilibrium of an autonomous system (with /f = 0), 
while the equilibrium position of the same system under the action of external force /f is given by the 
minimum of G. Thus the external force "wants" the system to subdue to its effect, "rewarding" it by 
lowering its G. (The analogy with the "disorder pressure" by a heat bath, discussed in the last paragraph, 
is evident.) 

For two remaining thermodynamic potentials, E and H, the calculations similar to Eqs. (42) and 
(43) make less sense, because that would require taking S = const (with V = const for E, and P = const 
for FT), but it is hard to prevent the entropy from growing if initially it had been lower than its 
equilibrium value, at least on the long-term basis. 30 Thus the circular diagram is not so symmetric after 
all: G and/or F are somewhat more useful for most practical calculations than E and H. 

One more important conceptual question is why the main task of statistical physics should be the 
calculation of thermodynamic potentials, rather than just a relation between P, V, and T. (Such relation 
is called the equation of state of the system.) Let us explore this issue on the example of an ideal 
classical gas in thermodynamic equilibrium, for which the equation of state should be well known to the 
reader from undergraduate physics (in Chapter 3, we will be derived from statistics): 

PV = NT, (1.44) 



where N is the number of particles in volume V. 31 Let us try to use it for the calculation of all 
thermodynamic potentials, and all other thermodynamic variables discussed above. We may start, for 
example, from the calculation of the free energy F. Indeed, solving Eq. (44) for pressure, P = NT/V, and 
integrating the second of Eqs. (35), we get 



F = -jPdV\ T = -AT[ 



dV 
V 



-NT 



■d(V IN) 
(V/N) 



-NT\n— + Nf(T), 
N 



(1.45) 



S = 



(8F^ 


= N 




_df 


[dTj 


V 


A 7 


dT _ 



(1.46) 



Ideal gas: 
equation 
of state 



where I have divided V by N in both instances just to present F as a manifestly extensive variable, in this 
uniform system proportional to N. The integration "constant" f(T) is some function of temperature that 
cannot be recovered from the equation of state. This function also affects all other thermodynamic 
potentials, and entropy. Indeed, using the first of Eqs. (35) together with Eq. (45), we get 



29 See, e.g., CM Sec. 1.5. 

30 There are a few practical systems, notably including the so-called magnetic refrigerators (to be discussed in 
Chapter 4), when the natural growth of 5 is so slow that the condition S = const may be closely approached. 

31 This equation was first derived from experimental data by E. Clapeyron (in 1834) in the form PV = nRTa, 
where n is the number of moles in the gas sample, and R « 8,31 J/mole-K is the so-called gas constant. This form 
is equivalent to Eq. (44), taking into account that R = IcbNa, where Na ~ 6.02x1 0 23 mole" 1 is the so-called 
Avogadro number, i.e. the number of molecules per mole. (By definition of the mole, N\ is just the reciprocal 
mass, in grams, of a baryon - more exactly, by convention, of a 1/1 2 th part of the carbon- 12 atom.) 
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and now can combine Eqs. (33) and (46) to calculate the (internal) energy, 



f 



E = F+TS = 



V 



-NT\n \-Nf 

N 



\ r 
+ T 



V 



df 



N\n N 

v N dT j 



= N 



f-T 



dT 



then use Eqs. (27), (44) and (47) to calculate enthalpy, 



H=E + PV = E + NT = N 



f 



T^- + T 
dT 



and, finally, plug Eqs. (44), and (45) into Eq. (37) to calculate the Gibbs energy 



G=F+PV =N 



V 



-Tin — + f+T 

v N 



(1.47) 



(1.48) 



(1.49) 



In particular, Eq. (47) describes a very important property of the ideal classical gas: its energy 
depends only on temperature, but not on volume or pressure. One might question whether function /( 7) 
may be physically insignificant, just like the arbitrary constant that may be always added to the potential 
energy in nonrelativistic mechanics. In order to address this concern, let us calculate, from Eqs. (24) and 
(28), both heat capacities, that are readily measurable quantities: 



r &E_ 

dT 



= -NT 



v 



d 2 f 
dT 2 



C P = 



dH 
dT 



= N 



Jp 



v 



dT z 



= C V +N. 



(1.50) 



(1.51) 



J 



We see that functional), or at least its second derivative, is measurable. 32 (In Chapter 3, we will 
calculate this function for two simple "microscopic" models of the ideal classical gas.) The meaning of 
this function is evident from the physical picture of the ideal gas: pressure P exerted on the walls of the 
containing volume is produced only by the translational motion of the gas molecules, while their 
internal energy E (and hence other thermodynamic potentials) may be also contributed by the internal 
motion of the molecules - their rotations, vibrations, etc. Thus, the equation of state does not give the 
full thermodynamic description of a system, while the thermodynamic potentials do. 



1.5. Systems with variable number of particles 

Now we have to consider one more important case when the number iV of particles in a system is 
not rigidly fixed, but may change as a result of a thermodynamic process. Typical examples of such a 
system is a gas sample separated from the environment by a penetrable partition (Fig. 7), and a gas in a 
contact with the liquid of the same molecules. 



32 Note, however, that the difference C P - C v = N (if temperature is measured in kelvins, C P - C v = nR) is 
independent of/(7). (It is possible to show that the difference C P - C v is fully determined by the equation of state 
for any medium.) 
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environment 








system 






dN 




Fi 



number of particles. 



Let us analyze this situation for the simplest case when all the particles are similar (though the 
analysis may be readily extended to systems with particle of several sorts). In this case we can consider 
N as an independent thermodynamic variable whose variation may change energy E of the system, so 
that (for a slow, reversible process) Eq. (17) should be now generalized as 

dE = TdS - PdV + judN, ( 1 .52) 



where ju is a new function of state, called the chemical potential? 3 Keeping the definitions of other 
thermodynamic potentials, given by Eqs. (27), (33), and (37) intact, we see that expressions for their 
differentials should be generalized as 

dH = TdS + VdP + judN, 



dF = -SdT - PdV + judN, 
dG = -SdT + VdP + /udN, 
so that the chemical potential may be calculated as either of the following derivatives: 34 





r DE^ 








r dF \ 






















K dNj 


s,v 


K dNj 


S,P 




T,V 


K dN) 



(1.53a) 
(1.53b) 
(1.53c) 

(1.54) 



Jt,p 



G = Nf(T,P). 

Plugging this expression into the last of Eqs. (54), we see that ju equals f(T,P). In other words, 




(1.55) 



(1.56) 



Chemical 
potential: 
definition 



Despite their similarity, one of Eqs. (53)-(54) is more consequential than the others. Indeed, the 
Gibbs energy G is the only thermodynamic potential that is a function of two intensive parameters, T 
and P. However, as all thermodynamic potentials, G has to be extensive, so that in a system of similar 
particles it has to be proportional to N: 



Chemical 
potential vs. 
Gibbs 
energy 



so that the chemical potential is just the Gibbs energy per particle. 



33 This name, of a historic origin, is a bit misleading: as evident from Eq. (52), ju has a clear physical sense of the 
average energy cost of adding one more particle to the system ofN» 1 particles. 

34 Note that strictly speaking, Eqs. (9), (15), (31), (35) and (39) should be now generalized by adding one more 
lower index, N, to the corresponding derivatives. 
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In order to demonstrate how vital the notion of chemical potential may be, let us consider the 
situation (parallel to that shown in Fig. 2) when a system consists of two parts, with equal pressure and 
temperature, that can exchange particles at a relatively slow rate (much slower than the speed of internal 
relaxation inside each of the parts). Then we can write two equations similar to Eq. (5): 



N = N 1 +N 2 , G = G l +G 2 , 
where N = const, and Eq. (56) may be used to describe each component of G: 

G = J u l N l + ju 2 N 2 . 

Plugging N 2 expressed from the first of Eqs. (57), N 2 = N - N\, into Eq. (58), we see that 

dG 



A -Mi, 



(1.57) 



(1.58) 



(1.59) 



so that the minimum of G is achieved at ju\ = ju 2 . Hence, in the conditions of fixed temperature and 
pressure, i.e. when G is the appropriate thermodynamic potential, the chemical potentials of the system 
parts should be equal - the so-called chemical equilibrium. 

Later we will also run into cases when volume V of a system, its temperature T, and the chemical 
potential ju are all fixed. (The last condition may be readily implemented by allowing the system of 
interest to exchange particles with a reservoir so large that its fj, stays constant.) A thermodynamic 
potential appropriate for this case may be obtained from the free energy F by subtraction of the product 
/uN, resulting is the so-called grand thermodynamic potential (or the "Landau potential") 



Q = F-uN = F N = F -G = -PV 

N 



Grand 
potential: 

definition Indeed, for a reversible process, the full differential of this potential is 
and full 1 1 

differential 

dQ. = dF- d(juN) = (SdT - PdV + judN) - (judN + Ndju) = SdT - PdV - Nd/u , 



(1.60) 



(1.61) 



so that if Q has been calculated as a function of T, V and /u, other thermodynamic variables may be 
found as 



S = 









, N = - 




, P = ~ 


[dTj 











(1.62) 



For an irreversible process, acting exactly as we have done with other potentials, it is 
straightforward to prove that in the conditions of fixed T, V, and ju, dVlldt < 0, so that system's 
equilibrium indeed corresponds to the minimum of the grand potential Q. 



1.6. Thermal machines 

In order to complete this brief review of thermodynamics, I cannot pass the topic of thermal 
machines - not because it will be used much in this course, but mostly because of its practical and 
historic significance. (Indeed, the whole field of thermodynamics was spurred by the famous 1824 work 
by S. Carnot, which in particular gave an alternative, indirect form of the 2 nd law of thermodynamics - 
see below.) 
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Figure 8a shows the generic scheme of a thermal machine that may perform mechanical work on 
the environment (in the notation of Eq. (1), equal to -Tty during each cycle of the expansion/compression 
of the "working gas", by transferring different amounts of heat from a high-temperature heat bath (Qh) 
and to the low-temperature bath {Ql). One relation between three amounts Q H , Ql, and is 
immediately given by the energy conservation (i.e. by "the 1 st law of thermodynamics"): 

Qh-Ql=-^- (1-63) 
From Eq. (1), the mechanical work during the cycle may be calculated as 

-V = §PdV, (1.64) 

i.e. equals the area circumvented by the representing point on the [P, V\ plane - see Fig. 8b. 35 Hence, the 
work depends on the exact form of the cycle, which in turn depends not only on T H and T L , but also on 
working gas' properties. 




Fig. 1.8. (a) The simplest implementation of a thermal machine, and (b) the graphic presentation of the 
mechanical work it performs. On panel (b), solid arrow indicates the heat engine cycle direction, while 
the dashed arrow, the refrigerator cycle direction. 

An exception from this rule is the famous Carnot cycle, consisting of two isothermal and two 
adiabatic processes (all reversible!). In its heat engine's form, the cycle starts from an isothermic 
expansion of the gas in contact with the hot bath (i.e. at T = Th), followed by its additional adiabatic 
expansion until T drops to 7l. Then an isothermal compression of the gas is performed in its contact 
with the cold bath (at T = T L ), followed by its additional adabatic compression to raise its temberature to 
T H again, after which the cycle is repeated again and again. (Note that during this cycle the working gas 
is never in contact with both heat baths simultaneously, thus avoiding the irreversible heat transfer 
between them.) The cycle's shape on the [V, P] plane depends on exact properties of the working gas 
and may be rather complicated. However, since the entropy is constant at any adabatic process, the 
Carnot cycle shape on the [S, T] plane is always rectangular - see Fig. 9. 36 



35 Note that positive sign of the circular integral corresponds to the clockwise rotation of the point, so that work (- 
i*) done by the working gas is positive at the clockwise rotation (pertinent to heat engines) and negative in the 
opposite case (implemented in refrigerators and heat pumps). 

36 A cycle with an [5, 7] shape very close to the Carnot (rectangular) one may be implemented at the already 
mentioned magnetic (or "adiabatic-demagnetization") refrigeration, using the alignment of either atomic or 
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0 S, 



(a) 



(b) 




Fig. 1.9. Representation of the 
Camot cycle (a) on the [5, 7] 
plane and (b) the [V, P] plane 
(schematically). The meaning of 
arrows is the same as in Fig. 8. 



Since during each isotherm, the working gas is brought into thermal contact only with the 
corresponding heat bath, Eq. (19), dQ = TdS may be immediately integrated to yield 



Qh=t h (s 2 -s 1 ), Q l =T l (S 2 -S,). 
Hence the ratio of these two heat flows is completely determined by their temperature ratio: 



Ql 



(1.65) 



(1.66) 



Heat 
engine 
efficiency: 
definition 



Carnot 
cycle's 
efficiency 



regardless of the working gas properties. Equations (63) and (66) are sufficient to find the ratio of work 
—id to any of Qh and Ql. For example, the main figure-of-merit of a thermal machine used as a heat 

engine (Q H > 0, Q L > 0, - W = \K\ > 0), is its efficiency 



(I. 67) 




For the Carnot cycle, Eq. (66) immediately yields the famous relation, 




(1.68) 



which shows that at given Tl (that is typically the ambient temperature -300 K), the efficiency may be 
increased, ultimately to l, by raising temperature of the heat source. 

On the other hand, if the cycle is reversed (see the dashed arrows in Figs. 8 and 9), the same 
thermal machine may serve as a refrigerator, providing the heat removal from the low-temperature bath 
{Ql < 0) for the cost of consuming external work: ~W > 0. This reversal does not affect the basic relation 
(63) that may be used to calculate the relevant figure-of-merit, called the cooling coefficient of 
performance (COP CO oiing) 



COP 



Ql 



Ql 



cooling 



" Qh-Ql 



(1.69) 



nuclear spins by external magnetic field. In such refrigerators (to be further discussed in the next chapter), the role 
of the {-P, V} pair of variables is played by the {% B} pair - see Eq. (3). 
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Notice that this coefficient may readily be above unity; 37 in particular, for the Carnot cycle we may use 
Eq. (66) (which is also unaffected by the cycle reversal) to get 

T 

(COP cooling ) (1.70) 

H 1 L 

so that the COP CO oiing is larger than 1 at T H < 2T L , and even diverges when the temperature difference (T H 
- T L ), sustained by the refrigerator, tends to zero. 

Since in the reversed cycle Q H = - & + Ql < 0, it also provides heat flow into the hotter heat 
bath, and thus may be used as a heat pump. However, the figure-of-merit appropriate for this application 
is different: 

c°wW = ft^? (171 > 



so that for the Carnot cycle 



T 

CamOt rr, rr, 

(1.72) 

' U J- T 



Note that this COP is always larger than 1 , meaning that the Carnot heat pump is always more 
efficient than the direct conversion of work into heat (where Qh = -^i and COPheating =1), though 
practical electricity-driven heat pumps are substantially more complex (and hence more expensive) than, 
say, simple electric heaters. Such heat pumps, with typical COPheating values around 4 in summer and 2 
in winter, are frequently used for heating large buildings. 

I have dwelled so long on the Carnot cycle, because it has a remarkable property: the highest 
possible efficiency of all heat-engine cycles. Indeed, in the Carnot cycle the transfer of heat between any 
heat bath and the working gas is performed reversibly, when their temperatures are equal. If this is not 
so, heat might flow from a hotter to colder system without performing any work. Hence the result (68) 
also yields the maximum efficiency of any heat engine. In particular, it shows that 77 max = 0 at T H = T L , 
i.e., no heat engine can perform any mechanical work in the absence of temperature gradients. 38 In some 
alternative axiomatic systems of thermodynamics, this fact, i.e. the impossibility of the direct conversion 
of heat to work, is postulated, and serves the role of the 2 nd law. 

Note also that according to Eq. (71), COP coo iing of the Carnot cycle tends to zero at T L — > 0, 
making it impossible to reach the absolute zero of temperature, and hence illustrating the meaningful 
(Nernst's) formulation of the 3 rd law of thermodynamics. Indeed, let us prescribe a certain (but very 
large) heat capacity C(7) to the low-temperature bath, and use the definition of this variable to write the 
following evident expression for the (very small) change of its temperature as a result of a relatively 
number dn of similar refrigeration cycles: 

C{T L )dT L =Q L dn. (1.73) 



37 This is why the term "cooling efficiency", used in some textbooks instead of (COP) coo i ing , may be misleading. 

38 Such a hypothetical (and impossible!) heat engine, which would violate the 2 nd law of thermodynamics, is 
called the "perpetual motion machine of the 2 nd kind" - in contrast to the "perpetual motion machine of the 1 st 
kind" with would violate the 1 st law, i.e., the energy conservation. 
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Together with Eq. (66), this relation yields 



C{T L )dT L =-%^T L dn, (1.74) 



T 

1 H 



so that if we perform many (n) cycles (with constant Q H and T H ), the initial and final values of T L obey 
the following equation 

^arw___\QA n (175) 



For example, if C(7) is a constant, Eq. (75) yields an exponential law, 



^fm = T ™ e x p| " %!r n \> ( 1 - 76 ) 



CT H 



with the absolute zero not reached as any finite n. Relation (75) proves the Nernst theorem if C(7) does 
not vanish at T — > 0,but for such metastable systems as glasses the situation is more complicated. 39 
Fortunately, this issue does not affect other aspects of statistical physics - at least those to be discussed 
in this course. 



1.7. Exercise problems 
1.1 . A gas has the following properties: 

(i) Cv = aT h , and 

(ii) the work ^needed for its isothermal compression from V 2 to V\ equals c71n(V 2 /Vi), 

where a, b, and c are constants. Calculate the equation of state, entropy S, and thermodynamic potentials 
E, H, F, G and Q of the gas. 



1.2 . A vessel with an ideal classical gas of indistinguishable molecules is separated by a partition 
so that the number of molecules in both parts is the same but their volumes are different. The gas is in 
thermal equilibrium, and its pressure in one part is P\, and in another, P 2 . Calculate the change of 
entropy caused by a fast removal of the partition. Analyze the result. 



1.3 . For an ideal classical gas with temperature-independent specific heat, derive the relation 
between P and V at the adiabatic expansion/compression. 



1.4 . Two bodies, with negligible thermal expansion coefficients, constant heat capacities C\ and 
C 2 , and, are placed into a weak thermal contact, at different initial temperatures T\ and T2. Calculate the 
full change of entropy of the system before it reaches the full thermal equilibrium. 



For a detailed discussion see, e.g., J. Wilks, The Third Law of Thermodynamics, Oxford U. Press, 1961. 
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1.5 . Two bodies have equal and constant heat capacities C, but different temperatures, T\ and T 2 . 
Find the maximum mechanical work obtainable from this system, using a heat engine. 

1.6 . Express the efficiency of a heat engine using the "Joule 
cycle", which consists of two adiabatic and two isobaric processes (see 
Fig. on the right), via the minimum and maximum values of pressure, 
and compare the result with that for the Carnot cycle. Assume an ideal 
classical working gas with constant Cp and CV . 
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Chapter 2. Principles of Physical Statistics 

This chapter is the key part of this course. It is started with a brief discussion of such basic notions of 
statistical physics as statistical ensembles, probability, and ergodicity. Then the so-called 
microcanonical distribution postulate is formulated in parallel with the statistical definition of entropy. 
The next step is the derivation of the Gibbs distribution, which if frequently considered the summit of the 
statistical physics, and one more, grand canonical distribution, which is more convenient for some tasks 
- in particular for the derivation of the Boltzmann, Fermi-Dirac, and Bose-Einstein statistics for systems 
of independent particles. 



Probability 



2.1. Statistical ensembles and probability 

As has been already discussed in Sec. 1.1, statistical physics deals with systems in conditions 
when either the unknown initial conditions, or the system complexity, or the laws of motion (as in the 
case of quantum mechanics) do not allow a definite prediction of measurement results. The main 
formalism for the analysis of such systems is the probability theory, so let me start with a very brief 
review of its basic concepts using informal "physical" language - less rigorous but (hopefully) more 
transparent than a standard mathematical treatment. 1 

Consider TV » 1 independent similar experiments carried out with apparently similar systems 
(i.e. systems with identical macroscopic parameters such as volume, pressure, etc.), but still giving, by 
any of the reasons outlined above, different results of measurements. Such a collection of experiments, 
together with the fixed method of result processing, is a good example of a statistical ensemble. Let us 
start from the case when the experiments may have M different discrete outcomes, and the number of 
experiments giving the corresponding different results is N\, N2,..., Nm, so that 

M 

T,N m =N. (2.1) 
The probability of each outcome, for the given statistical ensemble, is then defined as 

(2.2) 



N 

vy m — 11111 A^oo 



Though this definition is so close to our everyday experience that it is almost self-evident, a few remarks 
may still be relevant. 

First, probabilities W m depend on the exact statistical ensemble they are defined for, notably 
including the method of result processing. As an example, consider the standard coin tossing. For the 
ensemble of all tossed coins, the probabilities of both the heads and tails outcomes equal Vz. However, 
nothing prevents us from defining another statistical ensemble as a set of coin-tossing experiments with 
the heads-up outcome. Evidently, the probability of finding coins with tails up in this new ensemble is 
not Vz but 0. Still, this set of experiments is not only legitimate but also a rather meaningful statistical 



1 For the reader interested in reviewing a more rigorous approach, I can recommend, for example, Chapter 18 of 
the handbook by G. Korn and T. Korn - see MA Sec. 16(h). 



© 2013 K. Likharev 



Open online access under cc bv-nc-sa license 



Essential Graduate Physics 



SM: Statistical Mechanics 



ensemble; for example, the exact position and orientation of the tossed coins on the floor, within this 
restricted ensemble, may be rather random. 

Second, a statistical ensemble does not necessarily require N different physical systems, e.g., N 
different coins. It is intuitively clear that tossing the same coin TV times constitutes an ensemble with 
similar statistical properties. More generally, a set of N experiments with the same system provides a 
statistical ensemble equivalent to the set of experiments with TV different systems, provided that the 
experiments are kept independent, i.e. that outcomes of past experiments do not affect those of the 
experiments to follow. Moreover, for most physical systems of interest any special preparation is 
unnecessary, and N different experiments, separated by sufficiently long time intervals, form a "good" 
statistical ensemble - the property called ergodicity. 2 

Third, the reference to infinite N in Eq. (2) does not strip the notion of probability from its 
practical relevance. Indeed, it is easy to prove (see Chapter 5) that, at very general conditions, at finite 
but sufficiently large N, numbers N m are approaching their average (or expectation) values 3 

(N m ) = W m N, (2.3) 

with the relative deviation scale decreasing as l/(N m ) . 

Now let me list those properties of probabilities that we will immediately need. First, dividing 
Eq. (1) by A^ and following the limit TV— > qo, we get the well-known normalization condition 

M 

IX =i; (2.4) 

m=l 

just remember that it is true only if each experiment definitely yields one of outcomes N\, N2,..., Nm- 
Next, if we have an additive function of results, 

1 M 

f = ^Z N m fm> (2-5) 

where f m are some definite (deterministic) coefficients, we may define the statistical average (also called 
the expectation value) of the function as 

1 M 

(/Him^-£(jV m )/ m , (2.6) 
i\ m= i 



2 The most popular counter-example of a non-ergodic system is an energy-conserving system of particles placed 
in a potential which is a quadratic form of particle coordinates. Theory of oscillations tells us (see, e.g., CM Sec. 
5.2) that this system is equivalent to a set of non-interacting harmonic oscillators. Each of these oscillators 
conserves its own initial energy Ej forever, so that the statistics of N measurements of one such system may differ 
from that of N different systems with random distribution of Ej, even if the total energy of the system, E = HjEj, is 
the same. Such non- ergodicity, however, is a rather feeble phenomenon, and is readily destroyed by any of 
"mixing" mechanisms, such as weak interaction with environment (leading, in particular, to oscillation damping), 
nonlinear interaction of the components (see, e.g., CM Ch. 4), and chaos (CM Ch. 9), all of them strongly 
enhanced by increasing the number of particles in the system, i.e. the number of its degrees of freedom. This is 
why most real-life systems are ergodic; for those interested in non-ergodic exotics, I can recommend the 
monograph by V. Arnold and A. Avez, Ergodic Problems of Classical Mechanics, Addison- Wesley, 1989. 

3 Here, and everywhere in these notes, angle brackets (...) mean averaging over a statistical ensemble, which is 
generally different from averaging over time - as it will be the case in quite a few examples considered below. 
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distribution 



so that using Eq. (3) we get 




(2.7) 



Notice that Eq. (3) may be considered as the particular form of this general result, for al\f m = 1 . 

Next, the spectrum of possible experimental outcomes is frequently continuous. (Think, for 
example, about the positions of the marks left by bullets fired into a target from a far.) The above 
formulas may be readily generalized to this case; let us start from the simplest situation when all 
different outcomes may be described by one continuous variable q, which replaces the discrete index m 
in Eqs. (l)-(7). The basic relation for this case is the self-evident fact that the probability dW of having 
an outcome within a very small interval dq near point q is proportional to the magnitude of that interval: 



dW = w(q)dq. 



(2.8) 



Function w(q), which does not depend on dq, is called the probability density. Now all the above 
formulas may be recast by replacing probabilities W m by products (8), and the summation over m, by 
integration over q. In particular, instead of Eq. (4) the normalization condition now becomes 



\w(q)dq = \, 



(2.9) 



where the integration should be extended over the whole range of possible values of q. Similarly, instead 
by Eq. (5), it is natural to consider a function j{q). Then instead of Eq. (7), the expectation value of the 
function may be calculated as 



f) = \w(q)f(q)dq. 



(2.10) 



It is straightforward to generalize these formulas to the case of more variables. For example, results of 
measurements of a particle with 3 degrees of freedom may be described by the probability density w 
defined in the 6D space of its generalized radius-vector q and momentum p. As a result, the expectation 
value of a function of these variables may be expressed as a 6D integral 



f) = j"w(q,p)/(q,p)rfV 3 /?. 



(2.11) 



Some systems considered in this course consist of components whose quantum properties 
cannot be ignored, so let us discuss how (/) should be calculated in this case. If by f m we mean 
measurement results, Eq. (7) (and its generalizations) of course remains valid, but since these numbers 
themselves may be affected by the intrinsic quantum-mechanical uncertainty, it may make sense to have 
a bit deeper look into this situation. Quantum mechanics tells us 4 that the most general expression for 
the expectation value of an observable/ in a certain ensemble of macroscopically similar systems is 



/)=Z^/ m . m -Tr(Wf) 



(2.12) 



Here f mm ' are the matrix elements of the quantum-mechanical operator / corresponding to the 
observable/ in a full basis of orthonormal states m, 



f 

J m 



m\t\m 



(2.13) 



4 See, e.g., QM Sec. 6.1. 
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while coefficients W mm - are elements of the so-called density matrix W, which represents, in the same 

basis, a density operator W describing properties of this ensemble. Equation (12) is evidently more 
general than Eq. (7), and is reduced to it only if the density matrix is diagonal: 

W mm .=WJ mm „ (2.14) 

(where S mm - is the Kronecker symbol), when the diagonal elements W m play the role of probabilities of 
the corresponding states. 

Thus the largest difference between the quantum and classical description is the presence, in Eq. 
(12), of the off-diagonal elements of the density matrix. They have largest values in the pure (also called 
"coherent") ensemble, in which the state of the system may be described with state vectors, e.g., the ket- 
vector 

l a ) = Z a »l w )' ( 2 - 15 ) 

m 

where a m are some complex coefficients. In this simple case, the density matrix elements are merely 

W mm ,=a m a m „ (2.16) 

so that the off-diagonal elements are of the same order as the diagonal elements. For example, in the 
very important particular case of a two-level system, the pure-state density matrix is 



W = 



f * * ^ 

* * 
\^£^2 CC-^ ^^2 J 



(2.17) 



so that the product of its off-diagonal components is as large as that of the diagonal components. In the 
most important basis of stationary states, i.e. eigenstates of system's time-independent Hamiltonian, 
coefficients a m oscillate in time as 5 

<* m (0 = a m (0) expj- i^- f J = \a m | expj- i^-t + i^, (2.1 8) 

where E m are the corresponding eigenenergies, and (p m are constant phase shifts. This means that while 
the diagonal terms of the density matrix (16) remain constant, its off-diagonal components are 
oscillating functions of time: 

W mm , = a m ,a m = \a m ,a m | expj/ E ' ~ E « j expftfo,, -<p m )\ (2.19) 

Due to the extreme smallness of the Planck constant (on the human scale of things), a miniscule random 
perturbations of eigenenergies are equivalent to substantial random changes of the phase multiplier, so 
that the time average of any off-diagonal matrix element tends to zero. Moreover, even if our statistical 
ensemble consists of systems with exactly the same E m , but different values (p m (which are typically hard 
to control at the initial preparation of the system), the average values of all W mm ■ (with m * m') vanish 
again. 



5 Here I use the Schrodinger picture of quantum mechanics in which the matrix elements do not evolve in 
time. 
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This is why, besides some very special cases, typical statistical ensembles of quantum particles 
are far from being pure, and in most cases (certainly including the thermodynamic equilibrium), a good 
approximation for their description is given by the opposite limit of the so-called classical mixture in 
which all off-diagonal matrix elements of the density matrix equal zero, and its diagonal elements W mm 
are merely the probabilities W m of the corresponding eigenstates. In this case, for observables 
compatible with energy, Eq. (12) is reduced to Eq. (7), with f m being the eigenvalues of variable / 



22. Microcanonical ensemble and distribution 

Let us start with the discussion of physical statistics with the simplest, microcanonical statistical 
ensemble 6 that is defined a set of macroscopically similar closed (isolated) systems with virtually the 
same total energy E. Since in quantum mechanics the energy of a closed system is quantized, it is 
convenient to include into the ensemble all systems with energies E m within a narrow interval AE « E, 
that is nevertheless much larger than the average distance SE between the energy levels, so that the 
number M of different quantum states within interval AE is large, M » 1 . Such choice of AE is only 
possible if SE « E; however, the reader should not worry too much about this condition, because the 
most important applications of the microcanonical ensemble are for very large systems (or very high 
energies) when the energy spectrum is very dense. 7 



E A 




Fig. 2.1. Very schematic image of the microcanonical 
ensemble. (Actually, the ensemble deals with quantum 
states rather than energy levels. An energy level may be 
degenerate, i.e. correspond to several states.) 



Micro- 
canonical 
distribution 



This ensemble serves as the basis for the formulation of a postulate which is most frequently 
called the microcanonical distribution (or sometimes the "main statistical hypothesis"): in the 
thermodynamic equilibrium, all possible states of the microcanonical ensemble have equal probability, 



W m = — = const. 
M 



(2.20) 



Though in some constructs of statistical mechanics this equality is derived from other axioms, which 
look more plausible to their authors, I believe that Eq. (20) may be taken as the starting point of the 
statistical physics, supported "just" by the compliance of all its corollaries with experimental 
observations. 8 

Note that postulate (20) sheds a light on the nature of the macroscopic irreversibility of 
microscopically reversible (closed) systems: if such a system was initially in a certain state, its time 



6 The terms "microcanonical", as well as "canonical" (see Sec. 4 below) are apparently due to J. Gibbs, and I 
could not find out his motivation for these names. ("Canonical" in the sense of "standard" or "common" is quite 
appropriate, but why "micro"?) 

7 Formally, the main result of this section, Eq. (20), is valid for any M (including M = 1), it is just less 
informative for small M - and trivial for M = I. 

8 Though I have to move on, let me note that the microcanonical distribution (20) is a very nontrivial postulate, 
and my advice to the reader to give some thought to this foundation of the whole building of statistical mechanics. 
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evolution with just miniscule interactions with environment (which is necessary for reaching the 
thermodynamic equilibrium) would eventually lead to the uniform distribution of its probability among 
all states with the essentially same energy. Each of these states is not "better" than the initial one; rather, 
in a macroscopic system, there are just so many of these states that the chance to find the system in the 
initial state is practically nil - again, think about the ink drop diffusion into a glass of water. 

Now let us find a suitable definition of entropy S of a microcanonical ensemble member - for 
now, in the thermodynamic equilibrium only. Since S is a measure of disorder, it should be related to the 
amount of information lost when the system goes from the full order to the full disorder, i.e. into the 
microcanonical distribution (20), or, in other words, the amount of information 9 necessary to find the 
exact state of your system in a microcanonical ensemble. 

In the information theory, the amount of information necessary to make a definite choice 
between two options with equal probabilities (Fig. 2a) is defined as 

7(2) = log 2 2 = l. (2.21) 

This unit of information is called a bit. Now, if we need to make a choice between 4 equally probable 
opportunities, it can be made in two similar steps (Fig. 2b), each requiring one bit of information, so that 
the total amount of information necessary for the choice is 

7(4) = 2/(2) = 2 = log 2 4. (2.22) 

An obvious extension of this process to the choice between M= 2 m states gives 

I(M) = ml (2) = m = log 2 M. (2.23) 



(a) „ (b) 




T I 31 <^ Fig. 2.2. "Logarithmic trees" of binary decisions 

* — for making a choice between (a) 2 and (b) 4 

1 1 bit j opportunities with equal probabilities. 

1 bit 

This measure, if extended naturally to any integer M, is quite suitable for the definition of 
entropy at equilibrium, with the only difference that, following tradition, the binary logarithm is 
replaced with the natural one: 10 



9 I will rely on reader's common sense and intuitive understanding what information is, because in the formal 
information theory this notion is also essentially postulated - - see, e.g., the wonderfully clear text by J. Pierce, An 
Introduction to Information Theory, Dover, 1980. 

10 This is of course just the change of a constant factor: S(M) = lnM= ln2 x log2M= ln2 x I(M) « 0.693 I(M). A 
review of Chapter 1 shows that nothing in thermodynamics prevents us from choosing such coefficient arbitrarily, 
with the corresponding change of the temperature scale - see Eq. (1.9). In particular, in the SI units, Eq. (24b) 
becomes 5" = -k B lnW m , so that one bit of information corresponds to the entropy change AS = k B ln2 ~ 0.693 k B ~ 
0.965xl0" 23 J/K. By the way, formula "S = k logff" is engraved on the tombstone of L. Boltzmann (1844-1906) 
who was the first one to recognize this intimate connection between the entropy and probability. 
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S = InM. (2.24a) 
Using Eq. (20), we may recast this definition in the most frequently used form 



S = \n— = -\nW m 



(2.24b) 



(Again, please note that Eq. (24) is valid in the thermodynamic equilibrium only!) 

Equation (24) satisfies the major condition for the entropy definition in thermodynamics, i.e. to 
be a unique characteristics of disorder. Indeed, according to Eq. (20), number M (and hence any function 
of M) are the only possible measures characterizing the microcanonical distribution. We also need this 
function of M to satisfy another requirement to the entropy, of being an extensive thermodynamic 
variable, and Eq. (24) does satisfy this requirement as well. Indeed, mathematics says that for two 
independent systems the joint probability is just a product of their partial probabilities, and hence, 
according to Eq. (24b), their entropies just add up. 

Now let us see whether Eqs. (20) and (24) are compatible with the 2 nd law of thermodynamics. 
For that, we need to generalize Eq. (24) for S to an arbitrary state of the system (generally, out of 
thermodynamic equilibrium), with arbitrary state probabilities W m . For that, let us first recognize that M 
in Eq. (24) is just the number of possible ways to commit a particular system to a certain state n (n = 1, 
2,... M), in a statistical ensemble where each state is equally probable. Now let us consider a more 
general ensemble, still consisting of a large number N» 1 of similar systems, but with a certain number 
N m = W m N » 1 of systems in each of M states, with W m not necessarily equal. In this case the evident 
generalization of Eq. (24) is that the entropy Sn of the whole ensemble is 

S N =lnM(N v N 2 ,..), (2.25) 

where M (M,A^,...) is the number of ways to commit a particular system to a certain state n, while 
keeping all numbers N n fixed. Such number M (NiJSf 2 ,...) is clearly equal to the number of ways to 
distribute TV distinct balls between M different boxes, with the fixed number N m of balls in each box, but 
in no particular order within it. Comparing this description with the definition of the so-called 

multinomial coefficients , n we get 

M(N U N 2 ,...)= N C = — — with # = (2.26) 

N V N 2 ,...,N M AMAM..JV M ! 

In order to simplify the resulting expression for Sn, we can use the famous Stirling formula in its 
crudest, de Moivre's form 12 whose accuracy is suitable for most purposes of statistical physics: 

HN\)\ N ^^N(\nN-\). (2.27) 
When applied to our current problem, this gives the following average entropy per system, 13 



11 See, e.g., MA Eq. (2.3). Despite the intimidating name, Eq. (26) may be very simply derived. Indeed, Nl is just 
the number of all possible permutations of N balls, i.e. the ways to place them in certain positions - say, inside M 
boxes. Now in order to take into account that the particular order of the balls in each box is not important, that 
number should be divided by all numbers N„\ of possible permutations of balls within each box - that's it. 

12 See, e.g.,MAEq. (2.10). 
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N l ,N 2 ,...,N i 



M 



N 



IV 1 

ln(JV!)-£ln(JVJ) 



Mln7V-l)-X^„(ln^„-l) 



m=1 



1 m M N 

lniV V N m In N m = - Y — i In 

J v m=l m=l J v J v 



(2.28) 



and since this result is only valid in the limit N m —>co anyway, we may use Eq. (2) to present it as 



M M 1 

S = -Yw m \nW m =YW m In — 

m=\ m=\ m 



(2.29) 



This extremely important formula 14 may be interpreted as the average of the entropy values given by Eq. 
(24), weighed with specific probabilities W m in accordance with the general formula (7). 15 

Now let us find what distribution of probabilities W m provides the largest value of entropy (29). 
The answer is almost evident from a single glance at Eq. (29). For example, if coefficients W m are 
constant (and hence equal to \IM') for a subgroup of M' < M states and equal zero for all others, all M' 
nonvanishing terms in the sum (29) are equal to each other, so that 

S = M' — lnM' = lnM', (2.30) 
M' 

so that the closer M' to its maximum number M the larger S. Hence, the maximum of S is reached at the 
uniform distribution given by Eq. (24). 

In order to prove this important fact more strictly, let us find the maximum of function given by 
Eq. (29). If its arguments W\, W%, . . . Wm were completely independent, this could be done by finding the 
point (in the M-dimensional space of coefficients W m ) where all partial derivatives 8S/dW m are equal to 
zero. However, since the probabilities are constrained by condition (4), the differentiation has to be 
carried out more carefully, taking into account this interdependence: 



dW„ 



■S{W„W 2 ,..) 



cond 



es 

dW„, 



PS dW m , 

nr. . nr. 



(2.31) 



Entropy 
out of 
equilibrium 



At the maximum of function S, all such expressions should be equal to zero simultaneously. This 
condition may be presented as dS/dW m = X, where the so-called Lagrange multiplier A is independent of 
m. Indeed, at such point Eq. (31) becomes 



dW„ 



S(W X ,W 2 ,...) 



cond 



. dW 



dW m 



z 



dW m 



dW, 



m J 



= k (1) = 0. (2.32) 



13 Strictly speaking, I should use notation (S) here. However, following the style accepted in thermodynamics, I 
will drop the averaging sign until we will really need them to avoid confusion. Again, this shorthand is not too 
bad because the relative fluctuations of entropy (as those of any macroscopic variable) are very small at N » I. 

14 With the replacement of lnW m for log 2 W m (i.e. division by ln2), Eq. (29) is famous as the Shannon (or 
"Boltzmann-Shannon") formula for average information / per symbol in a long communication string using M 
different symbols, with probability W m each. 

15 In some textbooks, this simple argument is even accepted as the derivation of Eq. (29); however, it is evidently 
less strict than the one outlined above. 
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For the particular expression (29), condition dSldW m = X yields 

dS d 



dW m dW m 



[-WJnW m ] = -\nW m -\ = A. (2.33) 



Equation (33) may hold for all m (and hence the entropy reach its maximum value) only if W m is 
independent on m. Thus entropy (29) indeed reaches its maximum value (24) at equilibrium. 

To summarize, we see that definition (24) of entropy in statistical physics does fit all the 
requirements imposed on this variable by thermodynamics. 16 In particular, we have been able to prove 
the 2 nd law of thermodynamics, starting from that definition and a more fundamental postulate (20). 
Now let me discuss one possible point of discomfort with that definition: it depends on the accepted 
energy interval of the microcanonical ensemble, for whose width AE no exact guidance is offered. 
However, if the interval AE contains many states, M » 1, then with a very small relative error 
(vanishing in the limit M — > oo), M may be presented as 

M = g{E)AE, (2.34) 
where g{E) is the density of states of the system: 

(235) 

dE 

being the total number of states with energies below E. (Note that the average interval SE between 
energy levels, mentioned in the beginning of this section, is just SE = AE/M = l/g.) Plugging Eq. (34) 
into Eq. (24), we get 

S = lnM=lng(£) + lnA£, (2.36) 

so that the only effect of a particular choice of AE is an offset of entropy by a constant, and in Chapter 1 
we have seen that such a shift does not affect any measurable quantity. Of course, Eq. (34), and hence 
Eq. (36) are only precise in the limit when density of states g{E) is so large that the range available for 
the appropriate choice of AE , 

g\E)«AE«E, (2.37) 

is sufficiently broad: M = g(E)E = EISE » 1 . 

In order to get some feeling of the functions g{E) and S(E) and the feasibility of condition (37), 
and also to see whether the microcanonical distribution may be directly used for calculations of 
thermodynamic variables in particular systems, let us apply it to a microcanonical ensemble of many 
sets of N » 1 independent, similar harmonic oscillators with eigenfrequency co. (Please note that the 
requirement of a virtually fixed energy is applied, in this case, the total energy E N of the set, rather to a 
single oscillator - whose energy E may be virtually arbitrary, though certainly less than E N ~ NE.) Basic 
quantum mechanics 17 teaches us that the eigenenergies of such an oscillator form a discrete, equidistant 
spectrum: 



16 This is not to say that these definitions are fully equivalent. Despite all the wealth of quantitative relations 
given by thermodynamics, it still leaves a substantial uncertainty in the definition of entropy (and hence 
temperature), while Eq. (24) narrows this uncertainty to an unsubstantial constant. 

17 See, e.g., QM Sees. 2.10 and 5.4. 
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where m = 0, 1, 2,. 



(2.38) 



If co is kept constant, the zero-point energy hco/2 does not contribute to any thermodynamic properties of 
the system and may be ignored, 18 so that for the sake of simplicity we may take that point as the energy 
origin, and replace Eq. (38) with E m = mtico. Let us carry out an approximate analysis of the system for 
the case when its average energy per oscillator, 



E = ^- 
N 



(2.39) 



is much larger than the energy quantum tico. For one oscillator, the number of states with energy S\ 
below certain value = E\ » fico is evidently *L{E\) ~ E\lfico (Fig. 3a). For two oscillators, all possible 
values of the total energy {e\ + e 2 ) below some level E 2 correspond to the points of a 2D square grid 
within the right triangle shown in Fig. 3b, giving £(£2) ~ (l/2)(E 2 /hco) . For three oscillators, the 
possible values of the total energy (e\ + e 2 + £3) correspond to those points of the 3D cubic mesh, that fit 
inside the right pyramid shown in Fig. 3c, giving £(£3) ~ (l/2)(l/3)(£ , 3 //zc«) = (1/3 \\E3Jhc0) , etc. 



(a) 



r >^ 

1 • • • • > 

0 hm 2hm ■■"Z(E 1 )hco £ i 
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Fig. 2.3. Calculating functions E(E^) for the systems of (a) one, (b) two and (c) three quantum oscillators. 



An evident generalization of these formulas to arbitrary jV gives the number of states 

1 



m 



\hco j 



(2.40) 



where coefficient 1/iV! has the geometrical meaning of the (hyper)volume of the iV-dimensional right 
pyramid with unit sides. Differentiating Eq. (40), we get 



g(E N ) = 



1 



dZ(E N ) 



dE N (N-l)\(ha>Y 



(2.41) 



so that 



18 Let me hope that the reader knows that if the zero-point energy is experimentally measurable - for example 
using the famous Casimir effect - see, e.g., QM Sec. 9.1. In Sec. 5.6 below we will discuss another method of 
experimental observation of that energy. 
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S N (E N ) = In g(E N ) + const = - ln[(TV - 1) l] + (TV - 1) In ^ - TV ln(h co) + const. (2.42) 

For Af» 1 we can ignore the difference between TV and (TV - 1) in both instances, and use the Stirling 
formula (27) to simplify this result as 



S N (E) - const « TV 



ln- 



Nhco 



+ 1 



«TV 



ln- 



fico 



= ln 



(2.43) 



(The second approximation is only valid at very high Elfico ratios, when the logarithm in Eq. (43) is 
substantially larger than 1, i.e. is rather crude. 19 ) Returning for a second to the density of states, we see 
that in the limit TV — > qo, it is exponentially large: 



Average 
energy of a 
classical 
oscillator 



g(E N ) = e 



N ~ 



hco 



(2.44) 



so that both conditions (37) may be satisfied within a very broad range of AE. 

Now we can use Eq. (43) to find all thermodynamic properties of the system, though only in the 
limit E » hco. Indeed, according to thermodynamics (see Sec. 1.2), if the system volume and number of 
particles are fixed, the derivative dS/dE is nothing more than the reciprocal temperature - see Eqs. (1.9) 
or (1.15). In our current case, we imply that the harmonic oscillators are distinct, for example by their 
spatial positions. Hence, even if we can speak of some volume of the system, it is certainly fixed. 20 
Differentiating Eq. (43) over energy E, we get 



1 


_ dS N 


. N . 


i 


T ' 


dE N 




" E' 



(2.45) 



Reading this result backwards, we see that the average energy E of a harmonic oscillator equals T (i.e. 
k B T K is SI units). As we will show in Sec. 5 below, this is the correct asymptotic form of the exact result, 
valid in our current limit E»hco. 

Result (45) may be readily generalized. Indeed, in quantum mechanics a harmonic oscillator with 
eigenfrequency co may by described by Hamiltonian 



„ 2 ~2 



2m 



(2.46) 



where q is some generalized coordinate, and p the corresponding generalized momentum, m is 

1/2 

oscillator's mass, 21 and k is the spring constant, so that co = {idm) . Since in thermodynamic 
equilibrium the density matrix is always diagonal (see Sec. 1 above) in basis of stationary states m, 
quantum-mechanical averages of the kinetic and potential energies may be found from Eq. (7): 



19 Let me offer a very vivid example how slowly does the logarithm function grow at large values of its argument: 
In of the number of atoms in the visible Universe is less than 200. 

20 By the same reason, the notion of pressure P in such a system is not clearly defined, and neither are any 
thermodynamic potentials but E and F. 

21 Let me hope that using the same letter for the mass and the state number would not lead to reader's confusion. I 
believe that the difference between these uses is very clear from the context. 
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= 2XW^rK ( 2 - 47 ) 



m=0 



where W,„ is the probability to occupy m-th energy level, and bra- and ket-vectors describe the stationary 
state corresponding to that level. 22 However, both classical and quantum mechanics teach us that for any 
m, the bra-kets under the sums in Eqs. (47), which present the average kinetic and mechanical energies 
of the oscillator on its m th energy level, are equal to each other, and hence each of them is equal to EJ2. 
Hence, even though we do not know the probability distribution W m yet (it will be calculated in Sec. 5 
below), we may conclude that in the "classical limit" T» hco, 




(2.48) 



Equipartition 
theorem 



Now let us consider a system with an arbitrary number of degrees of freedom, described by a 
more general Hamiltonian: 23 

• (2 ' 49> 

with (generally, different) eigenfrequencies COj = (Kjlmj) . Since the "modes" (effective harmonic 
oscillators), contributing into this Hamiltonian, are independent, result (48) is valid for each of the 
modes. This is the famous equipartition theorem: at thermal equilibrium with T » ha>j, the average 
energy of each so-called half-degree of freedom (which are defined as variables pj or qj, giving a 
quadratic term to the system's Hamiltonian), is equal to 772. 24 In particular, for each Cartesian 
coordinate qj of a free-moving, non-interacting particle this theorem is valid for any temperature, 
because such coordinates may be considered as ID harmonic oscillators with vanishing potential energy, 
i.e. COj = 0, so that condition T» hcoj is fulfilled at any temperature. 

At this point, a first-time student of thermodynamics should be very much relieved to see that the 
counter-intuitive thermodynamic definition (1.9) of temperature does indeed correspond to what we all 
have known about this notion from our kindergarten physics courses. 

I believe that our case study of quantum oscillator systems has been a fair illustration of both the 
strengths and weaknesses of the microcanonical ensemble approach. 25 On one hand, we could calculate 
virtually everything we wanted in the classical limit T » hco, but calculations for arbitrary T ~ hco, 
though possible, are difficult, because for that, all vertical steps of function ~L(E n) have to be carefully 



22 Note again that though we have committed the energy E N of N oscillators to be fixed (in order to apply Eq. 
(36), valid only for a microcanonical ensemble at thermodynamic equilibrium), single oscillator's energy E in our 
analysis may be arbitrary - within limits ha>« E < E N ~ NT. 

23 As a reminder, the Hamiltonian of any system whose classical Lagrangian function is an arbitrary quadratic 
form its generalized coordinates and the corresponding generalized velocities, may be brought to form (49) by an 
appropriate choice of "normal coordinates" qj which are certain linear combinations of the original coordinates - 
see, e.g., CM Sec. 5.2. 

24 This also means that in the classical limit, the heat capacity of a system is equal to the number of its half- 
degrees of freedom (in SI units, multiplied by k B ). 

25 The reader is strongly urged to solve Exercise 2, whose task is to do a similar calculation for another key ("two- 
level") physical system, and to compare the results. 
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counted. In Sec. 4, we will see that other statistical ensembles are much more convenient for such 
calculations. 

Let me conclude this discussion of entropy with a short notice on deterministic classical systems 
with a few degrees of freedom (and even simpler mathematical objects called "maps") that may exhibit 
essentially disordered behavior, called the deterministic chaos. 26 Such chaotic system may be 
approximately characterized by an entropy defined similarly to Eq. (29), where W m are probabilities to 
find it in different small regions of phase space, at well separated time intervals. On the other hand, one 
can use an equation slightly more general than Eq. (29) to define the so-called Kolmogorov (or 
"Kormogorov-Sinai") entropy K that characterizes the speed of loss of information about the initial state 
of the system, and hence what is called the "chaos' depth". In the definition of K, the sum over m is 
replaced with the summation over all possible permutations {m} = mo, m\, m^-i of small space 
regions, and W m is replaced with W{ m ), the probability of finding the system in the corresponding 
regions m at time moment t m , with t m = mr , in the limit z — > 0, with Nr = const. For chaos in the 
simplest objects, ID maps, K is equal to the Lyapunov exponent A > 0. 27 For systems of higher 
dimensionality, which are characterized by several Lyapunov exponents A, the Kolmogorov entropy is 
equal to the phase-space average of the sum of all positive A. These facts provide a much more 
practicable way of (typically, numerical) calculation of the Kolmogorov entropy than the direct use of 
its definition. 28 



2.3. Maxwell's Demon, information, and computation 

Before proceeding to other statistical distributions, I would like to address one more popular 
concern about Eq. (24), the direct relation between the entropy and information. Some physicists are still 
uneasy with the fact that there is absolutely nothing more in entropy that the (deficit of) information, 29 
though to the best of my knowledge, nobody has yet been able to suggest any experimentally verifiable 
difference between these two notions. Let me give one example of their direct relation, that is essentially 
a development of the thought experiment suggested by J. C. Maxwell as early as in 1867. 

Consider a volume containing just one molecule (considered as a point particle), and separated to 
two equal halves by a movable partition with a door that may be opened and closed at will, at no energy 
cost (Fig. 4a). If the door is open and the system is in thermodynamic equilibrium, we do not know on 
which side of the door partition the molecule is. Here the disorder (and hence entropy) are largest, and 
there is no way to get, from a large ensemble of such systems, any useful mechanical energy. 

Now, let us consider that we (as instructed by, in Lord Kelvin's formulation, an omniscient 
Maxwell's Demon) know which side of the partition the molecule is currently located. Then we may 



26 See, e.g., CM Chapter 9 and literature therein. 

27 For the definition of A, see, e.g., CM Eq. (9.9). 

28 For more discussion, see, e.g., either Sec. 6.2 of the monograph H. G. Schuster and W. Just, Deterministic 
Chaos, 4 th ed., Wiley- VHS, 2005, or the monograph by Arnold and Avez, cited in Sec. 1. 

29 While some of these concerns should be treated with due respect (because the ideas of entropy and disorder are 
indeed highly nontrivial), I have repeatedly run into rather shallow arguments which stemmed from arrogant 
contempt to the information theory as an "engineering discipline", and unwillingness to accept the notion of 
information on the equal footing with those of space, time, and energy. Fortunately, most leading physicists are 
much more flexible, and there are even opposite extremes such as J. A. Wheeler's "it from bit" (i.e. matter from 
information) philosophy - to which I cannot subscribe either. 
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close the door, so that molecule's impacts on the partition create, on the average, a pressure force f 
directed toward the empty part of the volume (in Fig. 4b, the right one). Now we can get from the 
molecule some mechanical work, say by allowing force fio move the partition to the right, and picking 

up the resulting mechanical energy by some deterministic external mechanism. After the partition has 
been moved to the right end of the volume, we can open the door again (Fig. 4c), equalizing the 
molecule's average pressure on both sides of the partition, and then slowly move the partition back to 
the middle of the volume, without doing any substantial work. With the kind help by Maxwell's Demon, 
we can repeat the cycle again and again, and hence make the system to do unlimited mechanical work, 
fed "only" by information and thermal motion, and thus implementing the perpetual motion machine of 
the 2 nd kind - see Sec. 1.6. The fact that such heat engines do not exist means that the Maxwell's Demon 
does not either: getting any new information, at nonvanishing temperature (i.e. at thermal agitation of 
particles) has a finite energy cost. 



( a ) (b) (c) 




Fig. 2.4. The Maxwell's Demon paradox: the volume with a single molecule (a) before and (b) after 
closing the door, and (c) after opening the door in the end of the expansion stage. 



In order to evaluate this cost, let us calculate the maximum work per cycle made by the 
Maxwell's heat engine (Fig. 4), assuming that it is constantly in thermal equilibrium with a heat bath of 
temperature T. Formula (21) tells us that the information supplied by the demon (what exactly half of 
the volume contains the molecule) is exactly one bit, I (2)= 1 . According to Eq. (24), this means that by 
getting this information we are reducing entropy by ASj = -ln2. Now, it would be a mistake to plug this 
(negative) entropy change into Eq. (1.19). First, that relation is only valid for slow, reversible processes. 
Moreover (and more importantly), this equation, as well as its irreversible version (1.41), is only valid 
for a fixed statistical ensemble. The change ASi does not belong to this category, and may be formally 
described by the change of the statistical ensemble - from the one consisting of all similar systems 
(experiments) with an unknown location of the molecule, to the new ensemble consisting of the systems 
with the molecule in its certain (in Fig. 4, left) half. 30 

Now let us consider the slow expansion of the "gas" after the door had been closed. At this stage, 
we do not need the demon's help any longer (i.e. the statistical ensemble is fixed), and we can use 
relation (1.19). At the assumed isothermal conditions (T = const), this relation may be integrated over 
the whole expansion process, getting AQ = TAS. At the finite position, the system's entropy should be 
the same as initially, i.e. before the door had been opened, because we again do not know where in the 
volume the molecule is. This means that the entropy was replenished, during the reversible expansion, 



30 This procedure of redefining the statistical ensemble is the central point of the connection between the 
information theory and physics, and is crucial in particular for any (meaningful :-) discussion of measurements in 
quantum mechanics - see, e.g., QM Sees. 2.5 and 7.7. 



Chapter 2 



Page 14 of 40 



Essential Graduate Physics 



SM: Statistical Mechanics 



from the heat bath, by AS* = - ASi = +ln2, so that AQ = TAS = 71n2. Since by the end of the whole cycle 
the internal energy E of the system is the same as before, all this heat should have gone into the 
mechanical energy obtained during the expansion. Thus the obtained work per cycle (i.e. for each 
obtained information bit) is 71n2 (A:B7kln2 in SI units), about 4x10" 1 Joule at room temperature. This is 
exactly the minimum energy cost of getting one bit of new information about a system at temperature T. 

The smallness of that amount on the everyday human scale has left the Maxwell's Demon 
paradox an academic exercise for almost a century. However, its discussion resumed in the 1960s in the 
context of energy consumption at numerical calculations, motivated by the exponential {Moore 's-law) 
progress of the digital integrated circuits, which leads in particular, to a fast reduction of energy AE 
"spent" (turned into heat) per one binary logic operation. In the current generations of semiconductor 
digital integrated circuits, AE is of the order of ~ 10" 16 J, 31 i.e. still exceeds the room-temperature value 

2 1 

of 71n2 = fc B 7kln2 « 3x10"" J by more than 4 orders of magnitude. Still, some engineers believe that 
thermodynamics imposes an important lower limit on AE and hence presents an insurmountable obstacle 
to the future progress of computation, 32 so that the issue deserves a discussion. 

Let me believe that the reader of these notes understands that, in contrast to naive popular 
thinking, computers do not create any new information; all they can do it to reshape (process) it, loosing 
most of input information on the go. Indeed, any digital computation algorithm may be decomposed into 
simple, binary logical operations, each of them performed by a certain logic circuit called the logic gate. 
Some of these gates (e.g., logical NOT performed by inverters, as well as memory READ and WRITE 
operations) do not change the amount of information in the computer. On the other hand, such 
information-irreversible logic gates as two-input NAND (or NOR, or XOR, etc.) actually erase one bit 
at each operation, because they turn two input bits into one output bit (Fig. 5a). 



(a) 





(b) 



Fig. 2.5. Simple examples 
of (a) irreversible and (b) 
potentially reversible logic 
circuits. Each rectangle 
presents a circuit storing 
one bit of information. 



In 1961, R. Landauer arrived at the conclusion that each logic operation should turn into heat at 
least energy 



31 In the dominating CMOS technology, AE is close to twice the energy CV 2 /2 of recharging the total capacitance 
C of the transistor gate electrodes and the wires interconnecting the gates, by the voltage V representing the binary 
unity. As the technology progresses, C decreases in approximate proportion with the minimum feature size, 
resulting in the almost proportional decrease of AE. (The used voltage V has almost saturated at ~1 V - the value 
that stems from the bandgap of ~1 eV of the used semiconductor - silicon.) 

32 Unfortunately, this delusion has resulted in a substantial and unjustified shift of electron device research 
resources toward using "non-charge degrees of freedom" (such as spin) - as if they do not obey the general laws 
of statistical physics! 
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AE mm =Tln2 = k n T K In 2 

min B 



This result may be illustrated with the Maxwell's Demon machine shown in Fig. 4, operating as 
heat pump. At the first stage, with the door closed, it uses external mechanical work AE = 71n2 to reduce 
the volume in which of the molecule is confined from V to V/2, pumping heat -AQ = AE into the heat 
bath. To model a logically-irreversible logic gate, let us now open the door in the partition, and thus 
loose 1 bit of information about molecule's position. Then we will never get work 71n2 back, because 
moving the partition back to the right, with door open, takes place at zero average pressure. Hence, Eq. 
(51) gives a fundamental limit for energy loss (per bit) at the logically irreversible computation. 

However, in 1973 C. Bennett came up with convincing arguments that it is possible to avoid 
such energy loss by using only operations that are reversible not only physically, but also logically. 33 
For that, one has to avoid any loss of information, i.e. any erasure of intermediate results, for example in 
the way shown in Fig. 5b. (For that, gate F should be physically reversible, with no substantial static 
power consumption.) In the end of all calculations, after the result has been copied into a memory, the 
intermediate results may be "rolled back" through reversible gate to be eventually merged into a copy of 
input data, again without erasing a single bit. The minimal energy dissipation at such reversible 
calculation tends to zero as the operation speed is decreased, so that the average energy loss per bit may 
be less than the perceived "fundamental thermodynamic limit" (51). 34 The price to pay for this ultralow 
dissipation is an enormous (exponential) complexity of hardware necessary for storage of all 
intermediate results. However, using irreversible gates sparely, it may be possible to reduce the 
complexity dramatically, so that in future the mostly reversible computation may be able to reduce 
energy consumption in practical digital electronics. 35 

Before we leave Maxwell's Demon behind, let me use it to discuss, for one more time, the 
relation between the reversibility of the classical and quantum mechanics of Hamiltonian systems and 
the irreversibility possible in thermodynamics and statistical physics. In our (or rather Lord Kelvin's :-) 
gedanken experiment shown in Fig. 4, the laws of mechanics governing the motion of the molecule are 
reversible all times. Still, at partition's motion to the right, driven by molecule's impacts, the entropy 
grows, because the molecule picks up heat AQ > 0, and hence entropy AS* = AQ/T > 0, from the heat 
bath. The physical mechanism of this irreversible entropy (read: disorder) growth is the interaction of 
the molecule with uncontrollable components of the heat bath, and the resulting loss of information 
about the motion of the molecule. Philosophically, the emergence of irreversibility in large systems is a 
strong argument against the reductionism - a nai've belief that knowing the exact laws of Nature at one 
level of its complexity, we can readily understand all the phenomena on the higher levels of its 
organization. In reality, the macroscopic irreversibility of large systems is a wonderful example of a new 
law (in this case, the 2 nd law of thermodynamics) that becomes relevant on the substantially new level of 
complexity - without defying the lower-level laws. Without such new laws, very little of the higher level 
organization of Nature may be understood. 



Energy 
dissipation at 
(2.51) irreversible 
computation 



33 C. Bennett, IBM J. Res. Devel. 17, 525 (1973); see also a later review C. Bennett, Int. J. Theor. Phys. 21, 905 
(1982). To the best of my knowledge, the sub-71n2 energy loss per logic step is still to be demonstrated 
experimentally, but at least one research team is closing at this goal. 

34 Reversible computation may also overcome the perceived "fundamental quantum limit", AEAt > H, where At is 
the time scale of the binary logic operation - see K. Likharev, Int. J. Theor. Phys. 21, 311 (1982). 

35 The situation is rather different for quantum computation which may be considered as a specific type of 
reversible but analog computation - see, e.g., QM Sec. 8.5 and references therein. 
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2.4. Canonical ensemble and the Gibbs distribution 



As we have seen in Sec. 2, the microcanonical distribution may be directly used for solving some 
simple problems, 36 but a further development of this approach (also due to J. Gibbs) turns out to be 
much more convenient for calculations. Let us consider that a statistical ensemble of similar systems we 
are studying, each in thermal equilibrium with a much larger heat bath of temperature T (Fig. 6a). Such 
an ensemble is called canonical. 



system 
under study 

E,„, T 4 



dQ, dS 



heat bath 
/-MB; T 



(a) 



Em 

0 



AE Y 



Ebb -E^-Em 



(b) 



Fig. 2.6. (a) System in a heat bath 
(a canonical ensemble member) 
and (b) energy spectrum of the 
composite system (including the 
heat bath). 



Next, it is intuitively evident that if the heat bath is sufficiently large, any thermodynamic 
variables characterizing the system under study should not depend on heat bath's environment. In 
particular, we may assume that the heat bath is thermally insulated; then the total energy E-z of the 
composite system (consisting of the system of our interest, plus the heat bath) does not change in time. 
For example, if our system of interest is on its certain (say, m th ) energy level, then 

+ E U 



E,=E„ 



(2.52) 



is conserved. Now let us partition this canonical ensemble into much smaller sub-ensembles, each being 
a microcanonical ensemble of composite systems whose total energy E% is the same - as discussed in 
Sec. 2, within a certain small energy interval AEx « Ex. According to the microcanonical distribution, 
probabilities to find the composite system, within this new ensemble, in any state are equal. Still, heat 
bath energies Eub = Ex- E m (Fig. 6b) of members of this microcanonical sub-ensemble may be different 
due to the difference in E m . 

The probability W(E m ) to find the system of our interest (within the selected sub-ensemble) on 
some energy level E m is proportional to the number AM of such systems in the sub-ensemble. Due to the 
very large size of the heat bath in comparison with that of the system under study, the heat bath' density 
of states gHB is very high, and AEx may be selected so that 



1 



g 



«AE X « \E m -E m ,\« E HB , 



(2.53) 



HB 



where m and m ' are any states of the system of our interest. As Fig. 6b shows, in this case we may write 
AM = gnB(Eim)AE-z- As a result, within the microcanonical ensemble with the total energy Ex, 



36 See also exercise problems listed in the end of this chapter. 
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W m *AM= g HB (E HB )AE, = g HB (E, -EJAE, . (2.54) 

Let us simplify this expression further, using the Taylor expansion with respect to relatively 
small E m « Ex. However, here we should be careful. As we have seen in Sec. 2, the density of states of 
a large system is an extremely rapidly growing function of energy, so that if we applied the Taylor 
expansion directly to Eq. (54), the Taylor series would converge for very small E m only. A much 
broader applicability range may be obtained by taking logarithm of both parts of Eq. (54) first: 

In W m = const + ln[g HB (£ z - E m )]+ In AE S = const + S HB (£ S -EJ, (2.55) 

where the second equality results from application of Eq. (36) to the heat bath, and InAEx has been 
incorporated into the constant. Now, we can Taylor-expand the (much more smooth) function of energy 
in the right-hand part, and limit ourselves to the two leading terms of the series: 



\nW„ « const + S 



dS HB 



HB E =0 J77 

dE HB 



E=0 



E m . (2.56) 



But according to Eq. (1.9), the derivative participating in this expression is nothing else than the 
reciprocal heat bath temperature that (due to the large bath size) does not depend on whether E m is equal 
to zero or not. Since our system of interest is in the thermal equilibrium with the bath, this is also the 
temperature Tof the system - see Eq. (1.8). Hence Eq. (56) is merely 

In W m = const (2.57) 

This equation describes a substantial decrease of W m as E m is increased by several T, and hence our 
linear approximation (56) is virtually exact as soon as E w is much larger than T— the condition that is 
rather easy to satisfy, because as we have seen in Sec. 2, the average energy of each particle is of the 
order of T. 

Now we should be careful again, because so far we have only derived Eq. (57) for a sub- 
ensemble with fixed Ex. However, since the right-hand part of Eq. (57) includes only E m and Tthat are 
independent of Ex, this relation is valid for all sub-ensembles of the canonical ensemble, and hence for 
the later ensemble as the whole. 37 Hence for the total probability to find our system of interest in state 
with energy E m , in the canonical ensemble with temperature T, we can write 



W m = const x expj - 1 = i exp 




(2.58) 



This is the famous Gibbs distribution (sometimes called the "canonical distribution"), 38 which is 
frequently arguably the summit of statistical physics, 39 because it may be used for a straightforward (or 
at least conceptually straightforward :-) calculation of all statistical and thermodynamic variables. 



Gibbs 
distribution 



37 Another way to arrive at the same conclusion is to note that the entropy of the whole canonical ensemble with 
fixed E m has to be a sum of entropies of its microcanonical sub-ensembles (with different Ex), which participate 
in Eq. (55). As a result, the logarithm of probability W m for our system of interest to have energy E m in the whole 
(canonical) ensemble is just a sum of Eqs. (57) for sub-ensembles with different E z . 

38 The temperature dependence of the type exp{-E/T}, especially when showing up in rates of certain events, e.g., 
chemical reactions, is also frequently called the Arrhenius law - after chemist S. Arrhenius who has noticed this 
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Statistical 
sum 




Before I illustrate this, let me first calculate the coefficient Z participating in Eq. (58) for the 
general case. Requiring, in accordance with Eq. (4), the sum of all W m to be equal 1, we get 



(2.59) 



where the summation is formally extended to all quantum states of the system, though in practical 
calculations, the sum may be truncated to include only the states that are noticeably occupied. This 
apparently humble normalization coefficient Z turns out to be so important for the relation between the 
Gibbs distribution (i.e. statistics) and thermodynamics that it has a special name - or actually, two 
names: either the statistical sum or the partition function. To demonstrate how important Z is, let us use 
the general Eq. (29) for entropy to calculate its value for the particular case of the canonical ensemble, 
i.e. the Gibbs distribution of probabilities W„: 



-Z>,>^=^Iexp 




(2.60) 



According to the general rule (7), the thermodynamic (i.e. ensemble-average) value E of the internal 
energy of the system is 



(2.61a) 



so that the second term in the right-hand part of Eq. (60) is just E/T, while the first term equals just InZ, 
due to the normalization condition (59). (As a parenthetic remark, using the notion of reciprocal 
temperature f5 = l/T, Eq. (61a), with account of Eq. (59), may be also rewritten as 

r d(lnZ) 



d/3 



(2.61b) 



This formula is very convenient for calculations if our prime interest is the average energy E rather than 
F or W n .) With these substitutions, Eq. (60) yields a very simple relation between the statistical sum and 
entropy: 



S = — + lnZ 
T 



(2.62) 



Using Eq. (1.33), we see that Eq. (62) gives a straightforward way to calculate the free energy F of the 
system from nothing else than its statistical sum: 



Ffrom Z 



F = E-TS = rin 



1 



(2.63) 



law in experimental data. In all cases I am aware of, the Gibbs distribution is the underlying reason of the 
Arrhenius law. 

39 This opinion is shared by several authoritative colleagues, including R. Feynman who climbs on this summit 
already by page 4 (!) of his brilliant book Statistical Mechanics, 2 nd ed., Westview, 1998. (Despite its title, this 
monograph a collection of lectures on a few diverse, mostly advanced topics of statistical physics, rather than its 
systematic course, so that unfortunately I cannot recommend it as a textbook.) 
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Now, using the general thermodynamic relations (see especially the circular diagram shown in 
Fig. 1.7b, and its discussion) we can calculate all thermodynamic potentials of the system, and all other 
variables of interest. Let me only note that in order to calculate pressure P - e.g., from the second of Eqs. 
(1.35) - we would need to know the explicit dependence of F, and hence of the statistical sum Z on the 
system volume V. This would require the calculation, by appropriate methods of either classical or 
quantum mechanics, of the volume dependence of eigenenergies E m . I will give numerous examples of 
such calculations later in the course. 40 

As the final note of this section, Eqs. (59) and (63) may be combined to give a very elegant 
expression, 

(2.64) 

which offers a convenient interpretation of free energy as a (rather specific) average of eigenenergies of 
the system. One more convenient formula may be obtained by using Eq. (64) to rewrite the Gibbs 
distribution (58) in the form 

^=exp|^^|. (2.65) 

In particular, this expression shows that that since all probabilities W m are below 1, F is always 
lower than the lowest energy level. Also, note that probabilities W m do not depend on the energy 
reference choice, i. e. on an arbitrary constant added to all E m (and hence to E and F). 




2.5. Harmonic oscillator statistics 

The last property may be immediately used in our first example of the Gibbs distribution 
application to a particular, but very important system - the harmonic oscillator, for the more general case 
then was done in Sec. 2, namely for a "quantum oscillator" with an arbitrary relation between T and 
hco. 41 Let us consider a canonical ensemble of similar oscillators, each in a contact with a heat bath of 
temperature T. Selecting the zero-point energy hco/2 for the origin of E, oscillator eigenenergies (38) 
become E m = mfico (m = 0, 1,. . .), so that the Gibbs distribution for probabilities of these states is 




mhco] 



(2.66) 



40 In many multiparticle systems, the effect of an external field may be presented as a sum of its effects on each 
particle - frequently described by interaction energy with structure - ■fjqj , where q/' is a generalized coordinate 
of k-th particle. Generally, this energy has to be included directly into energies of particle states E m , used in Z, and 
hence in the free energy F (63). In this case, the thermodynamic equilibrium corresponds to the minimum of F - 
see Eq. (1.42). On the other hand, for "linear" systems (whose energy is a quadratic-homogeneous form of its 
generalized coordinates and velocities), equivalent results may be obtained by accounting for the interaction at the 
thermodynamic level, i.e. by subtracting term f j (qj ) = f 'jN < q} k) ) from the free energy F calculated in the 
absence of the field, and then finding the equilibrium as a minimum of the resulting Gibbs energy G - see Eq. 
(1.43). In this case, any of the approaches is fine, provided only that the same interaction is not counted twice. 

41 A simpler task of making a similar calculation for another key quantum-mechanical object, the two-level 
system, is left for reader's exercise. 
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with the statistical sum 

Z = ±tJ-?pU±r, where l = exp|-H<l. 
This series is just an infinite geometric progression ("geometric series"); summing it, 42 we get 




(2.67) 



(2.68) 



Quantum 

oscillator's so that for the probability W m to find the oscillator at each energy level is 
statistics 



W m = {[- e - %a)IT \ e - m%a)IT _ 



(2.69) 



As Fig. 7a shows, the probability W m to find the oscillator in each particular state (but the ground 
one, with m = 0) vanishes in both low- and high-temperature limits, and reaches its maximum value W m 
~ 0.3/m at T - mfico, so that the contribution mticoW n of each level into the average oscillator energy E is 
always smaller than fico. 
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Tlfico T/hco 
Fig. 2.7. Statistical and thermodynamic parameters of a harmonic oscillator, as functions of temperature. 



This average energy may be calculated in any of two ways: either using Eq. (7): 



CO / \ 00 

E = ZE m W m = (l-e- h <° /T )Zmhcoe 



mhco/T 



(2.70) 



or (simpler) using Eq. (61b), as 



42 See, e.g., MA Eq. (2.8b). 
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^ = -A lnZ= A ln (i_ exp {_^}) ! /?S I. 



Both methods give (of course) the same famous result, 43 



E = E(eo,T) = tieo 



e ncolT -1 



(2.71) 



(2.72) 



Quantum 
oscillator's 
average 
energy 



which is valid for arbitrary temperature and plays the key role in many fundamental problems of 
physics. The red line in Fig. 7b shows this £ as a function of normalized temperature. At low 
temperatures, T « tieo, the oscillator is predominantly in its lowest (ground) state, and its energy (on top 
of the constant zero-point energy tieolll) is exponentially small: E « tieo exp{-heo/T} « T, tieo. On the 
other hand, in the high-temperature limit the energy tends to T. This is exactly the result (a particular 
case of the equipartition theorem) that was obtained in Sec. 2 from the microcanonical distribution. 
Please note how much simpler is the calculation starting from the Gibbs distribution, even for an 
arbitrary ratio Tltieo. 

To complete the discussion of thermodynamic properties of the harmonic oscillator, we can 
calculate its free energy using Eq. (63): 

I ,. -,„„ , , (2 ?3) 



F = rin- = rin(l-e-^ /r ). 



Now entropy may be found from thermodynamics: either from the first of Eqs. (1.35), S = -(dF/dT) v , or 
(even more easily) from Eq. (1.33): S= (E-F)/T. Both relations give, of course, the same result: 



S = 



ha> 



1 



Jim IT 



1 



ln(l-e"^ /r ). 



(2.74) 



Finally, since in the general case the dependence of the oscillator properties (essentially, ed) on volume 
V in this problem is not specified, such variables as P, /u, G, W, and Q are not defined, and we may 
calculate only the average heat capacity C per one oscillator: 



C = 



dE 
dT 



tieo 



hcolT 



hcolT 



tieoIlT 



sinh(freo/ 2T) 



(2.75) 



The calculated thermodynamic variables are shown in Fig. 7b. In the low-temperature limit (T 
« tied), they all tend to zero. On the other hand, in the high temperature limit (T » tied), F — > -T 
\n(Tlh(o)-+ -oD , S -> \n(T/hco) +oo, and C -> 1 (in SI units, C -> k B ). Note that the last limit is the 
direct corollary of the equipartition theorem: each of two "half-degrees of freedom" of the oscillator 
gives, in the classical limit, a contribution C = Vz into its heat capacity. 

Now let us use Eq. (69) to discuss the statistics of the quantum oscillator described by 
Hamiltonian (46), in the coordinate representation. Again using the fact that the density matrix is 
diagonal at thermodynamic equilibrium, we may use a relation similar to Eqs. (47) to calculate the 
probability density to find the oscillator at coordinate q: 



43 It was first obtained in 1924 by S. Bose, and is frequently called the Bose distribution - a particular case of the 
Bose-Einstein distribution - to be discussed in Sec. 8 below. 
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CO CO / \ CO 

<q) = YW m w m {q) = Z W ,nVM = k- e ~ %alT )ll e ~ m%a>IT \v'M > ( 2 - 76 ) 

m=0 m=0 m=0 

where y/ m (q) is the eigenfunction of m-th stationary state of the oscillator. Since each y/ m (q) is 
proportional to the Hermite polynomial 44 that requires at least m elementary functions for its 
representation, working out the sum in Eq. (76) is a bit tricky, 45 but the final result is rather simple: w(q) 
is just a normalized Gaussian distribution (the "bell curve"), 

W( ^(2^ eXP fw}' (2 ' ?7) 

with (q) = 0, and 

2\ h , na> 
' 2mco 2T 

Since coth£ tends to 1 at £ — > qo, and diverges as l/£ at £ — > 0, Eq. (78) shows that the width of 
coordinate distribution is constant (and equal to that, hllmco, of the ground-state wavefunction y/o) at T 
« ha>, and grows as Tlmco at TZ/zo — > qo. 

As a sanity check, we may use Eq. (78) to write the following expression, 

U) . ( *L) = ^coth^ ^ P" M ' at 7 ^ (2.79) 
\ " 4 IT [T/2, at hco«T, 

for the average potential energy of the oscillator. In order to comprehend this result, let us notice that 

Eq. (72) for the average full energy E was obtained by counting it from the ground state energy hto/2 of 

the oscillator. If we add this energy to the result, we get 

Average i 

energy ha) flO) flO) . flO) 
including E = — — 77= 1 = COth . (2.80) 

W2 e ho)IT -\ 2 2 2T 

We see that for arbitrary temperature, (U) = E/2, as we already concluded from Eq. (47). This means 
that the average kinetic energy, equal to E - (U), is also the same: 

( — ) = (-^— ) = — = — coth — . (2.81) 
\2m/ \ 2 / 2 4 27 

In the classical limit T » hco, both energies are equal to 772, returning us to the equipartition 
theorem result (48). 

2.6. Two important applications 

The results of the previous section, especially Eq. (72), have enumerable applications in physics, 
but I will have time for a brief discussion of only two of them. 



44 See, e.g.,QM Sec. 2.10. 

45 The calculation may be found, e.g., in QM Sec. 7.2. 
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(i) Blackbody radiation . Let us consider a free-space volume V limited by non-absorbing (i.e. 
ideally reflecting) walls. Electrodynamics tells us 46 that electromagnetic field in such a cavity may be 
presented as a sum of "modes" with time evolution similar to that of the usual harmonic oscillator, and 
quantum mechanics says 47 that the energy of such electromagnetic oscillator is quantized in accordance 
with Eq. (38), so that at thermal equilibrium the average energy is described by Eq. (72). If volume Lis 
large enough, 48 the number of these modes within a small range dk of the wavevector magnitude k is 49 

dN = -^ T d 2 k = -^-4nk 2 dk , (2.82) 
{Ixf {2nf 

where for electromagnetic waves, the degeneracy factor g = 2, due to their two different (e.g., linear) 
polarizations for the same wave vector k. With the isotropic dispersion relation for waves in vacuum, k 
= cole, the elementary volume d k corresponding to a small interval dco is a spherical shell of small 
thickness dk = dco/c, and Eq. (82) yields 

2L . co 2 dco Tr co 2 
{2nf c 3 n 2 c 

Using Eq. (72), we see that the spectral density of electromagnetic wave energy, per unit volume, is 



dN = 7:rTT 4n — — = V^ T dco. (2.83) 




Planck's 
(2.84) radiation 



law 



This is the famous Planck's blackbody radiation law. 50 To understand why its name mentions 
radiation, let us consider a small planar part, of area dA, of a surface that completely absorbs 
electromagnetic waves incident from any direction. (Such "perfect black body" approximation may be 
closely approached in special experimental structures, especially in limited frequency intervals.) Figure 
8 shows that if the arriving wave was planar, with the incidence angle 0, then power d^%cd) absorbed 

by the surface within a small frequency interval dco (i.e. energy arriving at the surface within unit time 
interval), would be equal to the radiation energy within the same frequency interval and inside a 
cylinder of height c, base area dAcosO, and hence volume dV= c dAcosd : 

d'P^cd) = u(co)dcodV = u(co)dco c dA cos 0 . (2.85) 

Since the thermally-induced field is isotropic, i.e. propagates equally in all directions, this results 
should be averaged over all solid angles within the polar angle interval 0 < 6< nl2: 

d'P(co) 1 ed'PJco) ,^ . . 1 V . n 2 f , n C , . ,„ n ^ 

— = — — e -±— L dQ. = cu(co) — sin a/6 1 \dcp cos 0 = -u(co). (2.86) 

dAdco An J dAdco An { { 4 



46 See, e.g., EM Sec. 7.9. 

47 See, e.g., QM Sec. 9.1. 

48 In our current context, the volume should be much larger than (ctilTf, where c « 3x10 s m/s is the speed of 
light. For room temperature (2"« & B x300K « 4xl0" 21 J), that lower bound is of the order of 10" 16 m 3 . 

49 See, e.g., EM Sec. 7.9, or QM Sec. 1.6. 

50 Let me hope the reader knows that the law was first suggested in 1 900 by M. Planck as an empirical fit for the 
experimental data on blackbody radiation, and this was the historic point at which the Planck constant h (or rather 
h = 2rth) was introduced - see, e.g., QM Sec. 1.1. 
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Hence the Plank's formula (84), multiplied by c/4, gives the power absorbed by such 
"blackbody" surface. But at thermal equilibrium, this absorption has to be exactly balanced by the 
surface's own radiation, due to its finite temperature T. 



\ dA cos 0 




Fig. 2.8. Calculating the relation between d'P(ai) 
and u{cd)dco. 



I am confident that the reader is familiar with the main features of the Planck law (84), including 
its general shape (Fig. 9), with the low-frequency asymptote u(co) <x co 2 (due to its historic significance 
bearing the special name of the Rayleigh- Jeans law), the exponential drop at high frequencies (the Wien 
law), and the resulting maximum of function u(co), reached at frequency 6W, 

hca^vllZT, (2.87) 

i.e. at wavelength /l max = 2nlk m3 ^ = Inclco^ « 2.76 ch/T. Still, I cannot help mentioning two particular 
values corresponding to visible light (/lmax ~ 500 nm) for Sun's surface temperature 7k « 6,000 K, and to 
mid-infrared range (lmax ~10 |um) for the Earth's surface temperature 7k ~ 300 K. The balance of these 
two radiations, absorbed and emitted by the Earth, determines its surface temperature, and hence has the 
key importance for all life on our planet. As one more example, the cosmic microwave background 
(CMB) radiation, closely following the Planck law with 7k = 2.725 K (and hence having maximum 
density at l^x ~ 1.9 mm), and in particular its weak anisotropy, is a major source of data for all modern 
cosmology. 51 




51 For a recent popular book of this topic, see, e.g., S. Singh, Big Bang: The Origins of the Universe, 
HarperCollins, 2005. 
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Now let us calculate the total energy E of this radiation in some volume V. It may be found from 
Eq. (72) by integration its over all frequencies: 52 



E = v\u{a)dG) = v\ 



hco 3 



dco 



VT l 



to e 3 



1 a 2 h 3 c 3 { e % 



= V 



n 



15ftV 



(2.88) 



(The last transition in Eq. (88) uses a table integral equal to T(4)^(4) = (3!)(;r 4 /90) = ;r 4 /15. 53 ) Using Eq. 
(86) to recast Eq. (88) into the total power radiated by a blackbody surface, we get the well-known 
Boltzmann law 



d-P 



71 



dA 60h J c 



3 „2 



K ' 



(2.89a) 



Boltzmann 
law 



where cr is the Stefan-Bo Itzmann constant 




Stephan- 
(2.89b) Boltzmann 



By this time, the thoughtful reader should have an important concern ready: Eq. (84) and hence 
Eq. (88) are based on Eq. (72) for the average energy of each oscillator, counted from its ground energy 
hcoll. However, the radiation power should not depend on the energy origin; why have not we included 
the ground energy of each oscillator into integration (88), as we have done in Eq. (80)? The answer is 
that usual radiation detectors only measure the difference between power 7\ n of the incident radiation 
(say, that of a blackbody surface with temperature 77) and their own back-radiation P^ with power 
corresponding to some effective temperature T d of the detector (Fig. 10). But however low Td is, the 
temperature-independent ground state energy contribution hcoll to the back radiation is always there. 
Hence, the hcoll drops out from the difference, and cannot be detected - at least in this simple way. This 
is the reason why we had the right to ignore this contribution in Eq. (88) - very fortunately, because it 
would lead to the integral's divergence at its upper limit. However, let me repeat again that the ground- 
state energy of the electromagnetic field oscillators is physically real, and can reveal itself in the Casimir 
effect and other experimentally observable phenomena. 



constant 







1 



E(co,T) + 



hco 



dco 



oc 



E(co,T d ) + 



hco 




dco 



Fig. 2.10. Generic scheme of 
the electromagnetic radiation 
power measurement. 



52 Note that the heat capacity C r = (8EldT) v , following from Eq. (88), is proportional to T 3 at any temperature, and 
hence does not obey the trend C v — > const at T — > oo. This is the result of the unlimited growth, with temperature, 
of the number of thermally-exited field oscillators with tia>< T. 

53 See, e.g., MA Eqs. (6.8b), (6.6b), and (2.7b). 
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Photon 
gas' 
equation of 
state 



One more interesting result may be deduced from the free energy F of the electromagnetic 
radiation, which may be also calculated by integration of Eq. (73) over all the modes, with the 
appropriate weight: 



F 



= 2>ln(l 



l-e 



-hcolT 



00 / 

Jrin(l-e 



-hca/T\dN_ 
dco 



-dco 



oo / 

= j>ln(l-e 



-ha IT 



V 



CO 



2 A 



2 3 
y n C j 



dco. (2.90) 



2 3 

Presenting co dco as d{co )/3, this integral may be readily worked out by parts, and reduced to a table 
integral similar to that in Eq. (88), yielding a surprisingly simple result: 

n 1 _ 4 E 
3~' 



F = -V 



45/z 3 c 3 



T 4 =- 



(2.91) 



Now we can use the second of general thermodynamic equations (1.35) to calculate pressure: 



P = 



r 8F \ 



dV 



K 



45/zV 



~ W 



(2.92a) 



This result might be, of course, derived by the integration of the expression for the forces exerted by 
each mode of the electromagnetic on confining the walls confining it to volume V, 54 but notice how 
much simpler the thermodynamic calculation is. Rewritten in the form, 



PV = 



(2.92b) 



this result may be considered as the equation of state of the electromagnetic field, i.e. from the quantum- 
mechanical point of view, the photon gas. As we will prove in the next chapter, the equation of state 
(1.44) of the ideal classical gas may be presented in a similar form, but with a coefficient generally 
different from Eq. (92). In particular, according to the equipartition theorem, for an ideal gas of 
nonrelativistic atoms whose internal degrees of freedom are in their ground state, whose whole energy is 
that of three translational "half-degrees of freedom", E = 3N(T/2), the factor before E is twice larger 
than in Eq. (92). On the other hand, a relativistic treatment of the classical gas shows that Eq. (92) is 
valid for any gas in the ultrarelativistic limit, T » mc , where m is the rest mass of the gas particle. 
Evidently, photons (i.e. particles with m = 0) satisfy this condition. 

Finally, let me note that Eq. (92) allows the following interesting interpretation. The last of Eqs. 
(1.60), being applied to Eq. (92), shows that in this particular case the grand potential Q equals (-is/3). 
But according to the definition of Q, the first of Eqs. (1.60), this means that the chemical potential of the 
electromagnetic field excitations vanishes: 



u = = 0 

N 



(2.93) 



In Sec. 8 below, we will see that the same result follows from Eq. (72) and the Bose-Einstein 
distribution, and discuss its physical sense. 

(ii) Specific heat of solids . The heat capacity of solids is readily measurable, and in the early 
1900s its experimentally observed temperature dependence served as an important test for emerging 



54 See, e.g., EM Sec. 9.8. 
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quantum theories. However, theoretical calculation of CV is not simple, 55 even for isolators whose 
specific heat is due to thermally-induced vibrations of their crystal lattice alone. 56 Indeed, a solid may be 
treated as an elastic continuum only at low relatively frequencies. Such continuum supports three 
different modes of mechanical waves with the same frequency co, that obey similar, linear dispersion 
laws, co = vk, but velocity v = V/ for one of these modes (the longitudinal sound) is higher than that (v t ) of 
two other modes (the transversal sound). 57 At such frequencies the wave mode density may be 
described by an evident modification of Eq. (83): 



dN = V- 



1 



{2nf 



1 



■ + ■ 



Anco dco . 



(2.94a) 



t j 



For what follows, it is convenient to rewrite this relation in a form similar to Eq. (83): 



3V A co 2 dco 
dN = T An — — . 



with v = 



\ V 1 



-1/3 



(2.94b) 



However, wave theory shows 58 that as frequency oof a sound wave in a periodic structure is 
increased so that its half-wavelength nlk approaches the crystal period d, the dispersion law cdk) 
becomes nonlinear before the frequency reaches a maximum at k = nld. To make the things even more 
complex, 3D crystals are generally anisotropic, so that the dispersion law is different in different 
directions of wave propagation. As a result, the exact statistics of thermally excited sound waves, and 
hence the heat capacity of crystals, is rather complex and specific for each particular crystal type. 

In 1912, P. Debye suggested an approximate theory of the temperature dependence of the 
specific heat, which is in a surprisingly good agreement with experiment for many insulators, including 
polycrystalline and amorphous materials. In his model, the linear (acoustic) dispersion law co = vk, with 
the effective sound velocity v, defined by the latter of Eqs. (94b), is assumed to be exact all the way up 
to some cutoff frequency cod, the same for all three wave modes. This cutoff frequency may be defined 
by the requirement that the total number of acoustic modes, calculated within this model from Eq. (94b), 



N = V- 



1 



3^ 



{Inf v 3 



^Anco 1 'dco = 



2nW 



(2.95) 



is equal to the universal number N = 3nV of degrees of freedom (and hence of independent oscillation 
modes) in a system of nV elastically coupled particles, where n is the atomic density of the crystal, i.e. 
the number of atoms per unit volume. Within this model, Eq. (72) immediately yields the following 
expression for the average energy and specific heat (in thermal equilibrium at temperature T): 



E = V 



1 



hco 



(In) 



3 v 3 



J JicolT i 
o e -1 



Anco 2 dco = 3nVT- D(x) x=T 



IT: 



(2.96) 



55 Due to low temperature expansion of solids, the difference between their C v and C P is small. 

56 In good conductors (e.g., metals), specific heat is contributed (and at low temperatures, dominated) by free 
electrons - see Sec. 3.3 below. 

57 See, e.g., CM Sec. 7.7. 

58 See, e.g., CM Sec. 5.3, in particular Fig. 5.5 and its discussion. 
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law 
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D(x) 



dD(x) 
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x=T D IT 



(2.97) 



where T D = hcoo is called the Debye temperature, 59 and 

x J 0 e^-l [fl- 



at x -» 0, 



n I5x , at x 



oo. 



(2.98) 



the Debye function. Red lines in Fig. 1 1 show the temperature dependence of the specific heat c v (per 
atom) within the Debye model. At high temperatures, it approaches a constant value of 3, corresponding 
to energy E = 3nVT, in accordance with the equipartition theorem for each of 3 degrees of freedom of 
each atom. (This model-insensitive value of cy is known as the Dulong-Petit law.) In the opposite limit 
of low temperatures, the specific heat is much smaller: 



12;r 4 ( T ^ 



T 

V D J 



«1, 



(2.99) 



reflecting the reduction of the number of excited waves with hca < T as the temperature is decreased. 




TIT D 

Fig. 2.1 1. Temperature dependence of the specific heat in the Debye (red lines) and Einstein (blue lines) models. 



As a historic curiosity, P. Debye's work followed one by A. Einstein, who had suggested (in 
1907) a simpler model of crystal vibrations. In this model, all 3nV independent oscillatory modes of nV 
atoms of the crystal have approximately the same frequency, say o>e, and Eq. (72) immediately yields 



E = 3nV- 



ha> E /T 



(2.100) 



e E -1 

so that the specific heat is functionally similar to Eq. (75): 



59 In SI units, Debye temperatures T D are of the order of a few hundred K for most simple solids (e.g., close to 430 
K for aluminum and 340 K for copper), with somewhat lower values for crystals with heavy atoms (-105 K for 
lead), and reach the highest value -2200 K for diamond with its relatively light atoms and very stiff lattice. 
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fico E I IT 
sinh(#<y E / 2T) 



(2.101) 



This dependence cy(T) is shown with blue lines in Fig. 1 1 (assuming, for the sake of simplicity, 
lia>E = 7b). At high temperatures, this result does satisfy the universal Dulong-Petit law (cy = 3), but at 
low temperatures the Einstein's model predicts a much faster (exponential) drop of the specific heart as 
the temperature is reduced. (The difference between the Debye and Einstein models is not too 
spectacular on the linear scale, but in the log-log plot, shown on the right panel of Fig. 11, it is rather 
dramatic. 60 ) The Debye model is in a much better agreement with experimental data for simple, 
monoatomic crystals, thus confirming the conceptual correctness of his wave-based approach. 

Note, however, that when a genius such as A. Einstein makes an error, there is probably some 
deep and important reason behind it. Indeed, crystals with the basic cell consisting of atoms of two or 
more types (such as NaCl, etc.), feature two or more separate branches of the dispersion law co{k) - see, 
e.g., Fig. 12.61 



co(k) 
(a.u.) 



"optical" branch 



"acoustic" branch 




0.4 0.6 

kd I ' n 



Fig. 2.12. Dispersion relation for longitudinal waves in 
a simple ID model of a solid, with similar interparticle 
distances d, but alternating particle masses, plotted for 
a particular mass ratio r = 5, 



While the lower "acoustic" branch is virtually similar to those for monoatomic crystals, and may 
be approximated by the Debye model, co = vk, reasonably well, the upper ("optical" 62 ) branch does 
approach co = 0 at any k. Moreover, for large values of the atom mass ratio r, the optical branches are 
almost flat, with virtually ^-independent frequencies coo that correspond to simple oscillations of each 
light atom between its heavy counterparts. For thermal excitations of such oscillations, and their 
contribution to the specific heat, the Einstein model (with coe = coo) gives a very good approximation, so 
that the specific heat may be well described by a sum of the Debye and Einstein laws (97) and (101), 
with appropriate weights. 



60 This is why there is a general "rule of thumb" in science: if you plot your data on a linear rather than log scale, 
you better have a good excuse ready. (A valid excuse example: the variable you are plotting changes sign within 
the important range.) 

61 This is the exact solution of a particular ID model of such a crystal - see CM Chapter 5. 

62 This term stems from the fact that at k — > 0, the mechanical waves corresponding to these branches have phase 
velocities v ph = cc(k)lk that are much higher than that of the acoustic waves, and may approach the speed of light. 
As a result, these waves can strongly interact with electromagnetic (practically, optical) waves of the same 
frequency, while the acoustic waves cannot. 
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2.1. Grand canonical ensemble and distribution 

As we have seen, the Gibbs distribution is a very convenient way to calculate statistical and 
thermodynamic properties of systems with a fixed number TV of particles. However, for systems in which 
TV may vary, another distribution is preferable for some applications. Several examples of such situations 
(as well as the basic thermodynamics of such systems) have already been discussed in Sec. 1.5. Perhaps 
even more importantly, statistical distributions for systems with variable N are also applicable to the 
ensembles of independent particles on a certain single-particle energy level - see the next section. 

With this motivation, let us consider what is called the grand canonical ensemble (Fig. 13). It is 
similar to the canonical ensemble discussed in the previous section (Fig. 6) in all aspects, besides that 
now the system under study and the heat bath (in this case typically called the environment) may 
exchange not only heat but also particles. In all system members of the ensemble, the environments are 
in both the thermal and chemical equilibrium, and their temperatures T and chemical potentials /u are 
equal. 



system 
under study 

E, n ,N> T, [£* 



dQ, dS 
*dN 



environment 



Fig. 2.13. Member of a grand canonical 
ensemble. 



Now let us assume that the system of interest is also in the chemical and thermal equilibrium 
with its environment. Then using exactly the same arguments as in Sec. 4 (including the specification of 
a microcanonical sub-ensemble with fixed £x and N£), we may generalize Eq. (55), taking into account 
that entropy S env of the environment is now a function of not only its energy E env = E^ - E m _ N , 63 but also 
the number of particles Ne = Nz- N, with Ex and Nx fixed: 



In W m N oc InM = In g env (E z -E mN ,N z -N) + InAE, = S env (£ z - E nhN ,N Z -N) + const 



dS. 



E yj N, 



dE. 



E,„ K , -- 



N + const. 



(2.102) 



In order to simplify this relation, we may rewrite Eq. (1.52) in the equivalent form 

dS = -dE + -dV-^-dN. 
T T T 



(2.103) 



Hence, if entropy 5* of a system is expressed as a function of E, V, and N, then 



63 The additional index in the new notation E myN for the energy of the system of interest reflects the fact that its 
eigenvalue spectrum is generally dependent on the number jV of particles in it. 
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fdS^ 




1 








P 








{dEj 


V,N 


= r' 




ydVj 


E,N 


= T' 




E,V 


T 



(2.104) 



Applying the first one and the last one of these relations to Eq. (102), and using the fact that, according 
to the discussion of Sec. 1.5, in equilibrium the temperatures T and chemical potentials ju of the system 
under study and its environment are equal, we get 



= S m (E Jl ,N 1 ,)-j;E mJt +^ + C onst 



(2.105) 



Again, exactly as at the derivation of the Gibbs distribution in Sec. 4, we may argue that since E m ^, T 
and /u do not depend on the choice of environment's size, i.e. on E% and Nj,, the probability W,„ iN for a 
system to have TV particles and be in m-th quantum state in the whole grand canonical ensemble should 
also obey a relation similar to Eq. (105). As a result, we get the so-called grand canonical distribution: 



W, 



1 



m,N 



-exp 



juN -E 



m,N 



Grand 
(2.106) canonical 
distribution 



Just as in the case of the Gibbs distribution, constant Zq (most often called the grand statistical sum, but 
sometimes the "grand partition function") should be determined from the probability normalization 
condition, now with the summation of probabilities W m> n over all possible values of both m and N: 



(2.107) 



Now, using the general Eq. (29) to calculate entropy for distribution (106) (exactly like we did it 
for the canonical ensemble), we get the following expression, 




Grand 

canonical 

sum 



E ju(N 



(2.108) 



which is evidently a generalization of Eq. (62). 64 We see that now the grand thermodynamic potential Q 
(rather than the free energy F) may be expressed directly via the normalization coefficient Zq. 



Q = F - filN) = E -TS - ju(N 



Tin—!— = -rin^exp« 



m,N 



(2.109) nfromZ G 



Finally, solving the last equality for Z<j, and plugging the result back into Eq. (106), we can rewrite the 
grand canonical distribution in the form 



W m « = exp- 



Q + juN-E n 



(2.110) 



similar to Eq. (65) for the Gibbs distribution. Indeed, in the particular case when the number iV of 
particles is fixed, N= (N), so that Q + juN=Q + ju(N) = F, Eq. (110) is reduced right to Eq. (65). 



64 The average number of particles (N) is of course exactly what was called iV in thermodynamics (see Ch. 1), but 
I need to keep this explicit notation here to make a clear distinction between this average value of the variable, 
and its particular values participating in Eqs. (102)-(1 10). 
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2.8. Systems of independent particles 



Now we will use the general statistical distributions discussed above to a simple but very 
important case when each system we are considering consists of many similar particles whose explicit 
(physical) interaction is negligible. As a result, each particular energy value E m _ N of such a system may 
be presented as a sum of energies Sk of its particles, where index k numbers single-particle energy levels 
(rather than of the whole system, as index m does). 

Let us start with the classical limit. In classical mechanics, the quantization effects are 
negligible, i.e. there is a virtually infinite number of states k within each finite energy interval. However, 
it is convenient to keep, for the time being, the discrete-state language, with understanding that the 
average number (Nt ) of particles in each of these states, frequently called the state occupancy, is very 
small. In this case, we may apply the Gibbs distribution to the canonical ensemble of single particles, 
and hence use it with the substitution E m $ — > Sk, so that Eq. (58) becomes 



This is the famous Boltzmann distribution. 65 Despite its superficial similarity to the Gibbs 
distribution (58), let me emphasize the conceptual difference between these two results. The Gibbs 
distribution describes the probability to find the whole system on energy level E m , and it is always valid - 
more exactly, for a canonical ensemble of systems in thermodynamic equilibrium. On the other hand, 
the Boltzmann distribution describes occupancy of an energy level of a single particle, and for systems 
of identical particles is valid only in the classical limit (Nk ) « 1, even if the particles do not interact 
directly. 

The last fact may be surprising, because it may seem that as soon as particles of the system are 
independent, nothing prevents us from using the Gibbs distribution to derive Eq. (Ill), regardless of the 
value of (Nk). This is indeed true if the particles are distinguishable, i.e. may be distinguished from each 
other - say by their fixed spatial positions, or by the states of certain internal degrees of freedom (say, 
spin), or any other "pencil mark". However, it is an experimental fact that elementary particles of each 
particular type (say, electrons) are identical to each other, i.e. cannot be "pencil-marked". For such 
particles we have to be more careful: even if they do not interact explicitly, there is still some implicit 
dependence in their behavior, which is especially evident for the so-called fermions (fundamental 
particles with semi-integer spin) they obey the Pauli exclusion principle that forbids two identical 
particles to be in the same quantum state, even if they do not interact explicitly. 66 

Note that here the term "the same quantum state" carries a heavy meaning load here. For 
example, if two particles are confined to stay in different spatial positions (say, reliably locked in 



65 The distribution was first suggested in 1877 by the founding father of statistical physic, L. Boltzmann. For the 
particular case when s is the kinetic energy of a free classical particle (and hence has a continuous spectrum), it is 
reduced to the Maxwell distribution - see Sec. 3.1 below. 

66 See, e.g., QM Sec. 8.1. 



Boltzmann 
distribution 




(2.111) 



where constant c should be found from the normalization condition: 




(2.112) 



k 
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different boxes), they are distinguishable even if they are internally identical. Thus the Pauli principle, 
as well as other identity effects such as Bose-Einstein condensation, to be discussed in the next chapter, 
are important only when identical particles may move in the same spatial region. In order to describe 
this case, instead of "identical", it is much better to use a more precise (though ugly) term 
indistinguishable particles. 67 

In order to take these effects into account, let us examine the effects of nonvanishing occupancy 
(Nk ) ~ 1 on statistical properties of a system of many non-interacting but indistinguishable particles (at 
the first stage of calculation, either fermions or bosons) in equilibrium, and apply the grand canonical 
distribution (109) to a very interesting particular grand canonical ensemble: a subset of particles in the 
same quantum state k (Fig. 14). 



single-particle energy levels: 



" s k 

••• Fig. 2.14. Grand canonical 

S\ ensemble of particles in the 

So same quantum state (with 

particle #: 1 2 ... j ... eigenenergy s k ). 



In this ensemble, the role of the environment is played by the particles in all other states k' ^ k, 
because due to infinitesimal interactions, the particles may change their states. In equilibrium, the 
chemical potential /u and temperature T of the system should not depend on the state number k, but the 
grand thermodynamic potential Q, k of the chosen particle subset may. Replacing TV with Nk - the 



particular (not average!) number of particles in k 
we may reduce Eq. (109) to 



th 



state, and the particular energy value E MiN with SkN k , 



Q k =-Tln 



= -rin 



N,, 



exp 



(2.113) 



where the summation should be carried out over all possible values of Nk. For the final calculation of 
this sum, the elementary particle type becomes essential. 

In particular, for fermions, obeying the Pauli principle, numbers Nk in Eq. (113)may take only 
two values, either 0 (state k is unoccupied) or 1 (the state is occupied), and the summation gives 



67 This invites a natural question: what particles are "elementary enough" for the identity? For example, protons 
and neutrons have an internal structure, in some sense consisting of quarks and gluons; they be considered 
elementary? Next, if protons and neutrons are elementary, are atoms? molecules? What about really large 
molecules (such as proteins)? viruses? The general answer to these questions, given by quantum mechanics (or 
rather experiment :-), is that any particles/systems, no matter how large and complex they are, are identical if they 
have exactly the same internal structure, and also are exactly in the same internal quantum state - for example, in 
the ground state of all their internal degrees of freedom. 
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Fermi- 
Dirac 
distribution 



Bose- 
Einstein 
distribution 



-Tin 



-Tin 



1 + exp- 



(2.114) 



A' =0,1 



Now the average occupancy may be calculated from the last of Eqs. (1.62) - in this case, with TV 
replaced with (Nk): 







1 




V d M j 


(£,-ju)/T ' 
t,v e w ; +1 



(2.115) 



This is the famous Fermi-Dirac distribution, derived in 1926 independently by E. Fermi and P. Dirac. 

On the other hand, bosons do not obey the Pauli principle, and for them numbers Nk can take any 
non-negative integer values. In this case, Eq. (113) turns into the following equality: 



Q k =-Tln £ 



N k =0 



exp- 



= —T In ^ X k , with X = exp- 



(2.116) 



N k =0 



This sum is just the usual geometric progression again, which converges if X < 1, giving 

1 1 



Q,, =-rin- 



l-X 



= -Tln- 



l-e 



<ji-e k )IT 



, for ju < s k 



(2.117) 



In this case the average occupancy, again calculated using Eq. (1.62) with TV replaced with (Nk), obeys 
the Bose-Einstein distribution, 



dju 



1 



v ^t* Jt,v e 



1 



-, for ju<s k , 



(2.118) 



which was derived in 1924 by S. Bose (for the particular case /u = 0) and generalized in 1925 by A. 
Einstein for an arbitrary chemical potential. In particular, comparing Eq. (118) with Eq. (72), we see that 
harmonic oscillator excitations, 68 each with energy hco, may be considered as bosons, with zero 
chemical potential. We have already obtained this result (ju = 0) in a different way - see Eq. (93). Its 
physical interpretation is that the oscillator excitations may be created inside the system, so that there is 
no energy cost // of moving them into the system from its environment. 

The simple form of Eqs. (115) and (118), as well as the fact that they differ "only" by the sign 
before the unity in their denominators, is one of most beautiful results of physics at large. This similarity 
should not disguise the facts that the energy dependences of (Nk), given by these two formulas, are 
rather different - see Fig. 15. In the Fermi-Dirac statistics, the average level occupancy is finite (and 
below 1) at any energy, while in the Bose-Einstein it may be above 1, and even diverges at Sk — > ju.. 
However, for any of these distributions, as temperature is increased, it eventually becomes much larger 
than the difference (Sk - ju) for all k. In this limit, (Nk) « 1, both distributions coincide with each other, 



68 As the reader certainly knows, for the electromagnetic field oscillators, such excitations are called photons; for 
mechanical oscillation modes, phonons. It is important, however, not to confuse these mode excitations with the 
oscillators as such, and be very careful in prescribing to them certain spatial locations - see, e.g., QM Sec. 9.1. 
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as well as with the Boltzmann distribution (111) with c = exp{///!T}. The last distribution, therefore, 
serves as the high-temperature limit for quantum particles of both sorts. 

A natural question now is how to find the chemical potential ju participating in Eqs. (115) and 
(118). In the grand canonical ensemble as such (Fig. 13), it is something imposed by system's 
environment. However, both the Fermi-Dirac and Bose-Einstein distributions are also applicable to 
equilibrium systems with a fixed but large number TV of particles. In these conditions, the role of the 
environment for some subset of N' « N particles is played by the remaining N - N' particles. In this 
case, /u may be found by calculation of (N) from the corresponding distribution, and then requiring it to 
be equal to the genuine number of particles in the system. In the next section, we will perform such 
calculations for several particular systems. 



1.5 



N, 



0.5 



Fig. 2.15. The Fermi-Dirac (blue line) and 
Bose-Einstein (red lines) distributions, and the 
Boltzmann distribution with c = exp{/u/T} 
(black line). 



(e k -/i)lT 



For those and other applications, it will be convenient for us to have ready expressions for 
entropy S of a general (i.e. not necessarily equilibrium) state of systems of independent Fermi or Bose 
particles, expressed not as a function of W m of the whole system - as Eq. (29) does, but as a function of 
the average occupancy numbers (Nt). For that, let us consider a composite system, each consisting of M 
» 1 similar but distinct component systems, numbered by index m = 1,2, ... M, with independent (i.e. 
not explicitly interacting) particles. We will assume that though in each of M component systems, the 
number Nk ' n ' > of particles in its A>th quantum state may be different (Fig. 16), but their total number jV/^ 
in the composite system is fixed: 



M 



(2.119) 



m=\ 



number of particles on &-th 
single-particle energy level: 



component system number: 



N 



-O 



(2) 



N 
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Fig. 2.16. Composite system with a certain distribution of 



particles in k-th state between M component systems. 
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As a result, the total energy of the composite system is fixed as well, 



Fermion 
entropy 



M 



J^Ni m) e k =N^e k =const t (2.120) 

so that an ensemble of many such composite systems (with the same k), in equilibrium, is 
microcanonical. According to Eq. (24a), the average entropy 5* per component system may be 
calculated as 

where Mk is the number of possible different ways such composite system (with fixed iV* 0 may be 
implemented. 

Let us start the calculation of Mk with Fermi particles - for which the Pauli principle is valid. 
Here the level occupancies Nk m) may be only equal 0 or 1, so that the distribution problem is solvable 
only ifNk < M, and evidently equivalent to the choice of Nk^ balls (in arbitrary order) from the total 
number of M distinct balls. Comparing this formulation with the binomial coefficient definition, 69 we 
immediately have 

M k = M C^= ^5 j-y. (2.122) 

From here, using the Stirling formula (again, in its simplest form (27)), we get 

(2.123) 



S k = -(N k ) ln(N k ) - (l - (N k ))ln(l -(N k )} 



where 



N 



N k )^M,N k ^^f ( 2 - 124 ) 

is exactly the average occupancy of the A>th single-particle level in each system that was discussed 
earlier in this section. Since for a Fermi system, (Nk) is always somewhere between 0 and 1, so that 
entropy (123) is always positive. 

In the Bose case, where the Pauli limitation is not valid, the number Nk of particles on the k-th 
level in each of the systems is an arbitrary (positive) integer. Let us consider Nk particles and (M - 1) 
partitions (shown by vertical lines in Fig. 16) between M systems as (M - 1 + A^ (Z) ) similar 
mathematical objects ordered along one axis. Then Mk may be calculated as the number of possible 
ways to distribute the (M - 1) indistinguishable partitions among these (M - 1 + A^ (Z) ) distinct objects, 
i.e. as the following binomial coefficient: 70 



M+N k -l = (M-\ + Nf ] ) \ 



M k = '* C„, = v W . (2.125) 



69 See, e.g.,MAEq. (2.2). 

70 See also MA Eq. (2.4). 
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Applying the Stirling formula (27) again, we get the following result, 



S k = -(N k ) \n{N k ) + (l + (N k »ln(l + (N k )), 



Boson 

(2.126) entropy 



which again differs from the Fermi case (123) "only" by the signs in the second term, and is valid for 
any positive (Nk). 

Expressions (123) and (126) are valid for an arbitrary (possibly non-equilibrium) case; they may 
be also used for an alternative derivation of the Fermi-Dirac (115) and Bose-Einstein (118) distributions 
valid in equilibrium. For that, we may use the method of Lagrange multipliers, requiring (just like it was 
done in Sec. 2) the total entropy of a system of TV independent, similar particles, 

S = 2>*, (2-127) 

k 

as a function of state occupancies (Nk), to attain its maximum, with the conditions of fixed total number 
of particles N and the total energy E: 

Y j {N k ) = N = const, = E = const • (2.128) 

k k 

The completion of this calculation is left for reader's exercise. 

In the classical limit, when the average occupancies (Nk) of all states are small, both the Fermi 
and Bose expressions for Sk tend to the same limit 



S k =-(N k )\n(NA for AO«l. 



n 1 oq\ Boltzmann 

\l.ily) entropy 



This expression, frequently referred to as the Boltzmann (or "classical") entropy, might be also obtained, 
for arbitrary (Nk), directly from Eq. (29) by considering an ensemble of systems, each consisting of just 
one classical particle, so thatis m — > Sk and W m — > (A^). Let me emphasize again that for indistinguishable 
particles, such identification is generally (i.e. at (Nk) ~ 1) illegitimate even if they do not interact 
explicitly. As we will see in the next chapter, the indistinguishability affects statistical properties of even 
classical particles. 



2.9. Exercise problems 

2.1 . Use the microcanonical distribution to calculate thermodynamic properties (including 
entropy, all relevant thermodynamic potentials, and heat capacity), of an ensemble of similar two-level 
systems, in thermodynamic equilibrium at temperature T that is comparable with the energy gap A. For 
each variable, sketch its temperature dependence, and find its asymptotic values (or trends) in the low- 
temperature and high- temperature limits. 

Hint: The two-level system is generally defined as any system with just two relevant states 
whose energies, say Eq and E\, are separated by a finite gap As - Eq. Its most popular (but not the 
only!) example is a spin-l/2 particle, e.g., an electron, in an external magnetic field. 



2.2 . Solve the Problem 1 using the Gibbs distribution. Also, calculate the probabilities of the 
energy level occupation, and give physical interpretations of your results, in both temperature limits. 
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2.3 . Find the average magnetic moment (m) of a spin- 1 /^ particle whose interaction with a 
constant external magnetic field B is described by Hamiltonian 

~ „ fie , 
// m B. m = o, 

2m 

where 6 is the Pauli vector operator, 71 in thermal equilibrium at temperature T. 



2.4 . Discuss the possibility of using a system of non- interacting spin-'/i particles in magnetic 
field for refrigeration. 

Hint: See a footnote in Sec. 1.6. 



2.5 . Use the microcanonical distribution to calculate the average entropy, energy, and pressure of 
a single classical particle of mass m, with no internal degrees of freedom, free to move in volume V, at 
temperature T. 

Hint: Try to make a more accurate calculation than has been done in Sec. 2.2 for the system of N 
harmonic oscillators. For that you will need to know the volume Vj of an J-dimensional hypersphere of 
the unit radius. To avoid being too cruel, I am giving it to you: 



V d =n dl1 IT 



+ 1 

v2 j 



where F(x) is the gamma-function. 72 



2.6 . Solve Problem 5 problem starting from the Gibbs distribution. 



2.7 . A quantum particle of mass m is confined to free motion along a ID segment of length a. 
Using any approach you like, find the average force the particle exerts on walls of such a "ID quantum 
well" in thermal equilibrium, and analyze its temperature dependence. 

Hint: You may consider series @(<^) = ]T e as a known function of £ 73 

«=i 



71 See, e.g., QM Sec. 4.4. 

72 For its definition and main properties, see, e.g., MA Eqs. (6.6)-(6.7). 

73 Indeed, it may be reduced to the so-called elliptic theta-function 6*?(z, r) for a particular case z = 0 - see, e.g., 
Sec. 16.27 in the Abramowitz-Stegun handbook cited in MA Sec. 16(ii). However, you do not need that (or any 
other :-) handbook to solve this problem. 
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2.8 . A quantum particle is free to move around a plane circle of radius r. Calculate its heat 
capacity. 



2.9 . An LC circuit (see Fig. on the right) is at thermodynamic 



2 1/2 

equilibrium with the environment. Find the r.m.s. fluctuation 8^ = < f ) of 



the voltage across it, for an arbitrary ratio T/hco, where a> = (LC) 
resonance frequency of this "tank circuit". 



1/2 



r 



is the 



C 



v 



2.10 . Use the general Eq. (123) to re-derive the Fermi-Dirac distribution (115), valid in 
equilibrium. 
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Chapter 3. Ideal and Not-So-Ideal Gases 

In this chapter, the general approaches discussed in the previous chapters are applied to calculate 
statistical and thermodynamic properties of gases, i.e. collections of identical particles {say, atoms or 
molecules) that are free to move inside a certain volume, either not interacting or weakly interacting 
with each other. 



3.1. Ideal classical gas 

Interactions of typical atoms and molecules are well localized, i.e. rapidly decreasing with 
distance r between them, becoming negligible at certain distance r 0 . In a gas of N particles inside 
volume V, the average distance (r) between the particles is of the order of (V/N) . As a result, if the gas 
density n = NIV ~ (r)" 3 is much lower than r 0 " 3 , i.e. if nr 0 3 « 1, the chance for its particles to approach 
each other and interact is rather small. The model in which such interactions are completely ignored is 
called the ideal gas. 

Let us start with a classical ideal gas, which may be defined as the gas in whose behavior the 
quantum effects are negligible. As we saw in Sec. 2.8, the condition of that is to have the average 
occupancy of each quantum state low: 

N k )«l. (3.1) 

It may seem that we have already found properties of such a system, in particular the equilibrium 
occupancy of its states - see Eq. (2.111): 

N k } = const xexpj-^j. (3.2) 

In some sense it is true, but we still need, first, to see what exactly does Eq. (2) means for the gas, i.e. a 
system with an essentially continuous energy spectrum, and, second, to show that, rather surprisingly, 
particles' indistinguishability affects some properties of even classical gases. 

The first of these tasks is evidently the easiest for a gas out of external fields, and with no 
internal degrees of freedom. 1 In this case Sk is just the kinetic energy of the particle obeys the isotropic 
and parabolic dispersion law 

p 2 pI+pI+pI n 

£ k=i~ = • (3-3) 

2m 2m 

Now we have to use two facts from other fields of physics. First, in quantum mechanics, momentum p 
is associated with wavevector k of the de Broglie wave, p = /zk. 2 Second, eigenvalues of k for any 
waves (including de Broglie waves) in free space are uniformly distributed in the momentum space, 
with a constant density of states, given by Eq. (2.82) 



1 In more realistic cases when particles do have internal degrees of freedom, but they are in certain (say, ground) 
quantum states, Eq. (3) is valid as well, with s k referred to the internal ground-state energy. 

2 See, e.g., QM Sec. 1.2. 
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gV 



dN, 



i.e. 



gV 



(3.4) 



d 3 k (2;r) 3 ' d 3 p {27th) 3 

where g is the degeneracy of particle's internal states (say, for electrons, the spin degeneracy g = 2). 

Even regardless of the exact proportionality coefficient between dN sta tes and d 3 p, the very fact of 
this proportionality means that the probability dW to find the particle in a small region d 3 p = dpxdpidp^, 
of the momentum space is proportional to the right-hand part of Eq. (2), with s k given by Eq. (3): 



dW = Coxp\ 



2mT 



,3__^-J Pi +P l +P l\d Pl dp 2 dp 3 . 



\d 3 p = Cexp<^ 



2mT 



(3.5) 



This is the famous Maxwell distribution. 3 The normalization constant C may be readily found 
from the last form of Eq. (5), by requiring that the integral of dW over all the momentum space to equal 
1, and using the fact that ID integrals over the each Cartesian component pj of the momentum (j = 1,2, 
3) are all equal, and may be reduced to the well-known Gaussian integral: 4 



C = 



Jexp 



2mT 



\dp, 



{2mT) l > 2 \e~Z d$ 



-3 



= (27rmT) 3 ' * 



(3.6) 



As a sanity check, let us use the Maxwell distribution to calculate the average energy 
corresponding to each half-degree of freedom: 



2m , 



2m 



dW 



+x p 2 



— exp\-- Pj —\dp j 
2m j 2mT ' 



+ 00 

C 1/3 Jex P1 



2mT 



\dPy 



\t; 2 e-t ^.(3.7) 



The last integral 5 equals V/z/2, so that, finally, 




r 

~2 



(3.8) 



This result is (fortunately :-) in agreement with the equipartition theorem (2.48). It also means that the 
r.m.s. velocity of the particles is 



1 / 2 



dv = ( v 



1/2 



3v^ 



1/2 



3- 



(3.9) 



Maxwell 
distribution 



3 This formula was suggested by J. C. Maxwell as early as in 1860, i.e. well before the Boltzmann and Gibbs 
distributions. Note also that the term "Maxwell distribution" is often associated with the distribution of particle's 
momentum (or velocity) magnitude, 

dW = AnCp 1 expj- ~^~\dp = 4;rCm V expj- ~^~rj^ v > wim 0< p,v<cc, 

which immediately follows from Eq. (5) combined with the expression dp = Artp 2 dp due to the spherical 
symmetry of the distribution in the momentum/velocity space. 

4 See, e.g., MA Eq. (6.9b). 

5 See, e.g., MA Eq. (6.9c). 
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26 

For a typical gas (say, N 2 ), with m ~ 30 m p ~ 6x10" kg at room temperature (7= feTk « £ B x300 

2 1 

K » 4x10" J), this velocity is about 4.5 km/s, about 10 times faster than a handgun bullet. Still, it is 
measurable using simple table-top equipment (say, a set of two concentric, rapidly rotating cylinders 
with a thin slit collimating an atomic beam emitted at the axis) that was available already in the end of 
the 19 th century. Experiments using such equipment gave convincing confirmations of Maxwell's 
theory. 

This is all very simple (isn't it?), but actually the thermodynamic properties of a classical gas, 
especially its entropy, are more intricate. To show that, let us apply the Gibbs distribution to gas 
portions consisting of N particles each, rather than just one of them. If the particles are exactly similar, 
the eigenenergy spectrum {£%} of each of them is also exactly the same, and each value E m of the total 
energy is just the sum of particular energies Euj) of the particles, where k(t), with / = 1,2, . . . N, is the k x 
energy level of / th particle. Moreover, since the gas is classical, (rik) « 1, there cannot be 2 or more 
particles in any state. As a result, we can use Eq. (2.59) to write 



th 



Z = £expj 



~-k(D 



(3.10) 



k(2) k(N) I 



where the summation has to be carried over all possible states of each particle. Since the summation 
over each set {£(/)} concerns only one of the operands of the product of exponents under the sum, it is 
tempting to complete the calculation as follows: 



Correct 
Boltzmann 
counting 



'*(!) 



Z ex p 



'k(2) 



Z ex p 



■k(N) 



k(\) 



k(2) 



k(N) 



V k 



(3.11) 



where the final summation is over all states of one particle. This formula is indeed valid for 
distinguishable particles. 6 However, if particles are indistinguishable (again, meaning that they are 
identical and free to move within the same spatial region), Eq. (11) has to be modified by what is called 
the correct Boltzmann counting: 



J_ 

AH 




(3.12) 



that considers all quantum states, differing only by particle permutations, as one. 



Now let us take into account that the fundamental relation (4) implies the following rule for the 
replacement of a sum over quantum states with an integral in the classical limit - whose exact conditions 
are still to be specified: 7 



K-) >J(-)^ : 

In application to Eq. (12), this rule yields 



gV 



{inhy 



(3.13) 



6 Since each particle belongs to the same portion of gas, i.e. cannot be distinguished from others by its spatial 
position, this requires some internal "pencil mark", for example a specific structure or a specific quantum state of 
its internal degrees of freedom. 

7 As a reminder, we have already used this rule (twice) in Sec. 2.6, with particular values of g. 
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Z = 



N\ 



gV 
{InKf 



+00 

Jexps 



l3Y 



Pj 

2mT 



\d P] 



(3.14) 



1/2 

The integral in square brackets is the same one as in Eq. (6), i.e. equal to {l7unT) , so that finally 



Z = 



N\ 



gV 



mT 

K 2nfi 2 j 



3/2 



(3.15) 



Now, assuming that N» l, 8 and applying the Stirling formula, we can calculate gas' free energy, 

F = Tln^ = -NTln^ + Nf(T), (3.16a) 

with 



f(T) - -T 



< In 




mT 


3/2 


+ 1 > 


g 








K 2nti 2 j 







(3.16b) 



The first of these relations is exactly Eq. (1.45) which was derived, in Sec. 1.4, from the equation 
of state PV = NT, using thermodynamic identities. At that stage this equation of state was just 
postulated, but now we can finally derive it by calculating pressure from the second of Eqs. (1.35), and 
Eq. (16a): 

f dF^ 



P = 



dv 



NT 



(3.17) 



Jt V 

So, the equation of state of the ideal classical gas, with density n = N/V, is indeed given by Eq. (1.44): 



_ NT _ 
P = = nT. 



(3.18) 



Hence we may use Eqs. (1.46)-(1.51), derived from this equation of state, to calculate all other 
thermodynamic variables of the gas. As one more sanity check, let us start with energy. Using Eq. (1.47) 
withy(7) given by Eq. (16b), we immediately get 



E = N 



f-T 



d£_ 
dT 



= -NT. 

2 ' 



(3.19) 



in full agreement with Eq. (8) and hence with the equipartition theorem. Much less trivial is the result 
for entropy, which may be obtained by combining Eqs. (1.46) and (15): 



(3.20) 



s = - 


r dF \ 


= N 


"mil 


df(T) 




ydTj 


V 


N 


dT 



8 For the opposite limit when N = g = 1 , Eq. (15) yields the results obtained, by two alternative methods, in 
Exercises 2.5 and 2.6. For N = 1, the "correct Boltzmann counting" factor TV! equals 1, so that the particle 
distinguishability effects vanish - naturally. 
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This formula, 9 in particular, provides the means to resolve the following gas mixing paradox 
(sometimes called the "Gibbs paradox"). Consider two volumes, V\ and V2, separated by a partition, 
each filled with the same gas, with the same density n, at the same temperature T. Now let us remove the 
partition and let the gases mix; would the total entropy change? According to Eq. (20), it would not, 
because the ration VIN = n, and hence the expression in square brackets is the same in the initial and the 
final state, so that the entropy is additive (extensive). This makes full sense if the gas particles in the 
both parts of the volume are identical, i.e. the partition's removal does not change our information about 
the system. However, let us assume that all particles are distinguishable; then the entropy should clearly 
increase, because the mixing would certainly decrease our information about the system, i.e. increase its 
disorder. A quantitative description of this effect may be obtained using Eq. (11). Repeating for Zdi st all 
the calculations made above for Z, we readily get a different formula for entropy: 



Notice that in contrast to the S given by Eq. (20), entropy Sdist is not proportional to N (at fixed 
temperature T and density N/V). While for distinguishable particles this fact does not present any 
conceptual problem, for indistinguishable particles it would mean that entropy were not an extensive 
variable, i.e. would contradict the basic assumptions of thermodynamics. This fact emphasizes again the 
necessity of the correct Boltzmann counting in the latter case. 

Comparing Eqs. (20) and (21), we can calculate the change of entropy due to mixing of 
distinguishable particles: 



Note that for a particular case, V\=Vi = V/2, Eq. (22) reduces to the simple result ASdist = (M + N2) ln2, 
which may be readily understood from the point of view of information theory. Indeed, allowing each 
particle of N = N\ + N2 to spread to twice larger volume, we loose one bit of information per particle, i.e. 
AI=(N\ + N2) bits for the whole system. 

Let me leave it for the reader to show that result (22) is also valid if particles in each sub-volume 
are indistinguishable from each other, but different from those in another sub-volume, i.e. for mixing of 
two different gases. 10 However, it is certainly not applicable to the system where all particles are 
identical, stressing again that the correct Boltzmann counting (12) does indeed affect entropy, even 
though it is not essential for either the Maxwell distribution (5), or the equation of state (18), or average 
energy (19). 

In this context, one may wonder whether the change (22) (called the mixing entropy) is 
experimentally observable. The answer is yes. For example, after free mixing of two different gases one 
can use a thin movable membrane that is semipermeable, i.e. penetrable by particles of one type only, to 



9 The result presented by Eq. (20), with function / given by Eq. (16b), was obtained independently by O. Sackur 
and H. Tetrode in 1911, i.e. well before the final formulation of quantum mechanics in the late 1920s. 

10 By the way, if an ideal classical gas consists of particles of several different sorts, its full pressure is a sum of 
independent partial pressures exerted by each component - the so-called Dalton law. While this fact was an 
important experimental discovery in the early 1800s, from the point of view of statistical physics this is just a 
straightforward corollary of Eq. (18), because in an ideal gas, the component particles do not interact. 
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AS dlst = {N x + N 2 ) ln(^ + V 2 ) - (N, In V x + N 2 In V 2 ) = N, In 
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separate them again, thus reducing the entropy back to the initial value, and measure either the necessary 
mechanical work A^= JASdist or the corresponding heat discharge into the heat bath. Practically, 
measurements of this type are easier in weak solutions, 1 1 systems with a small concentration c « 1 of 
particles of one sort (solute) within much more abundant particles of another sort (solvent). The mixing 
entropy also affects thermodynamics of chemical reactions in gases and liquids. 12 It is curious that 
besides purely thermal measurements, mixing entropy in some conducting solutions (electrolytes) is also 
measurable by a purely electrical method, called cyclic voltammetry, in which a low-frequency ac 
voltage, applied between solid-state electrodes embedded in the solution, is used to periodically separate 
different ions, and then mix them again. 13 

Now let us briefly discuss two generalizations of our results for ideal classical gases. First, let us 
consider the ideal classical gas in an external field of potential forces. It may be described by replacing 
Eq. (3) with 

s k =^ + U(r k ), (3.23) 
2m 

where r k is the position of the particular particle, and U(r) is the potential energy per particle. In this 
cases, Eq. (4) is applicable only to small volumes, V — > dV = d 3 r whose linear size is much smaller than 
the spatial scale of variations of macroscopic parameters of the gas- say, pressure. Hence, instead of Eq. 
(5), we may only write the probability dW of finding the particle in a small volume d rd p of the 6- 
dimensional phase space: 

dW = w(r,p)d 3 rd 3 p, w(r,p) = const xexp|-^y-^y^| . (3.24) 

Hence, the Maxwell distribution of particle velocities is still valid at each point r, and a more interesting 
issue here is the spatial distribution of the total density, 

n(r) = N^w(r,p)d 3 p, (3.25) 
of all gas particles, regardless of their momentum. For this variable, Eq. (24) yields 14 

n (r) = „(0)expj-^H, (3.26) 



11 It is interesting that statistical mechanics of weak solutions is very similar to that of ideal gases, with Eq. (18) 
recast into the following formula (derived in 1885 by J. van't Hoff), PV '= cNT, for the partial pressure of the 
solute. One of its corollaries is that the net force (called the osmotic pressure) exerted on a semipermeable 
membrane is proportional to the difference of solute concentrations it is supporting. 

12 Unfortunately, I do not have time for even a brief introduction into this important field, and have to refer the 
interested reader to specialized textbooks - for example, P. A. Rock, Chemical Thermodynamics, University 
Science Books, 1983; or P. Atkins, Physical Chemistry, 5 th ed., Freeman, 1994; or G. M. Barrow, Physical 
Chemistry, 6 th ed., McGraw-Hill, 1996. 

13 See, e.g., either Chapter 6 in A. Bard and L. Falkner, Electrochemical Methods, 2 nd ed., Wiley, 2000 (which is a 
good introduction to electrochemistry as the whole); or Sec. II. 8. 3.1 in F. Scholz (ed.), Electroanalytical Methods, 
2 nd ed., Springer, 2010. 

14 In some textbooks, Eq. (26) is also called the Boltzmann distribution, though it certainly should be 
distinguished from the more general Eq. (2.1 1 1). 
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where the potential energy reference is at the origin. As we will see in Chapter 6, in a non-uniform gas 
the equation of state (18) is valid locally if particles' mean free path I is much smaller than the spatial 
scale of changes of function n(r). 15 In this case, the local gas pressure may be still calculated from Eq. 
(18): 



P(r) = n(r)T = P(0)exp\- 



U(r) 



(3.27) 



An important example of application of Eq. (27) is an approximate description of the Earth 
atmosphere. At all heights h « R E ~ 6xl0 6 m above the Earth's surface (say, the sea level), we may 
describe the Earth gravity effect by potential U = mgh, and Eq. (27) yields the so-called barometric 
formula 



P(h) = P(Q)Qxp\-j- 



with h n = — = ^Zk. 



mg mg 



(3.28) 



For the same N 2 (the main component of the atmosphere) at 7k = 300 K, ho ~ 7 km. This gives the right 
order of magnitude of the Earth atmosphere's thickness, though the exact law of pressure change differs 
somewhat from Eq. (28) because of a certain drop of the absolute temperature T with height, by about 
20%at/*~/* 0 . 16 

The second generalization I would like to mention is to particles with internal degrees of 
freedom. Ignoring, for simplicity, the potential energy U(r), we may describe them by replacing Eq. (3) 
for 



2m 



+ s 



k ' 



(3.29) 



where Sk ' describes the internal energy of the A>th particle. If the particles are similar, we may repeat all 
above calculations, and see that all the results (including the Maxwell distribution) are still valid, with 
the only exception of Eq. (16) that now becomes 



/CO = ~T 



In 



g 



mT 

~2M* 



,3/2 




+ 1 



(3.30) 



As we already know from Eq. (1.51), this change may affect both heat capacities of the gas, C v and C P , 
but not their difference (equal to N). 



3.2. Calculating /u 

Now let us return to Eq. (3), i.e. neglect the external field effects, as well as thermal activation of 
the internal degrees of freedom, and discuss properties of ideal gases of indistinguishable quantum 



15 The mean free path may be defined by the geometric relation noi = 1, where a is the full cross-section of the 
particle-particle scattering - see, e.g., CM 3.7. 

16 The reason of the drop is that the atmosphere, including molecules such as H 2 0, CO2, etc., absorbs Sun's 
radiation at wavelengths -500 nm much smaller than those of the back- radiation of the Earth surface, with the 
spectrum centered at wavelength -10 |am - see Eq. (2.87) and its discussion. 



Chapter 3 



Page 7 of 30 



Essential Graduate Physics 



SM: Statistical Mechanics 



particles in more detail, paying special attention to the chemical potential /u - which, as you may recall, 
was a little bit mysterious aspect of the Fermi and Bose distributions. 

Let us start from the classical gas, and recall the conclusion of thermodynamics that ju is the 
Gibbs potential per unit particle - see Eq. (1.56). Hence we can calculate /u = GIN from Eqs. (1.49) and 
(16b). The result, 



u = -7Tn — + f(T)+T = Tin 

N 



N 



s ,\3/2 



gv 



mT 



(3.31) 



which may be rewritten as 



exp 



(f) 


_ N 


' ' 27th 2 






\ mT J 



3/2 



(3.32) 



is very important, because it gives us some information about ju not only for a classical gas, but for 
quantum (Fermi and Bose) gases as well. Indeed, we already know that for indistinguishable particles 
the Boltzmann distribution (2.111) is valid only if (Nk) « 1. Comparing this condition with quantum 
statistics (2.1 15) and (2.1 18), we see that the condition of gas' classicity may be expressed as 



expi 



«1 



(3.33) 



for all Sk. Since the lowest value of Sk given by Eq. (3) is zero, Eq. (35) for a gas may be satisfied only if 
exp{ju/T} « 1. This means that the chemical potential of the classical has to be not just negative, but 
also "strongly negative" in the sense 



-//» T. 

According to Eq. (32), this condition may be presented as 



T»T 



0 ' 



with To defined as 




(3.34) 



(3.35) 



(3.36) 



Condition (35) is very transparent physically: disregarding factor g (which is typically not 
much larger than 1), it means that the average thermal energy of a particle (which is of the order of 7) 
has to be much larger than the energy of quantization of particle's motion at length ya - the average 
distance between the particles. An alternative form of this condition is 



1/3. 



r A » g r t 



with r n 



(mT) 



1/2 



(3.37) 



For a typical gas (say, N2, with m « 14 m p ~ 2.3xl0" 26 kg) at the standard room temperature (T = 
£bx300K « 4.1xl0~ 21 J), tq « 10" 11 m, i.e. is significantly smaller than the physical size a ~ 3x 10" 10 mof 
the molecule. This estimate shows that at room temperature, as soon as any practical gas is rare enough 
to be ideal (r A » a), it is classical, i.e. the only way to observe the quantum effects in the translation 



Quantum 
scale of 
temperature 
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motion of molecules is a very deep refrigeration. According to Eq. (36), for the same nitrogen molecule, 
taking ya ~ 10 3 a ~ 10" 7 m (to ensure that direct interaction effects are negligible), Jb should be well 
below 1 |uK. 

In order to analyze quantitatively what happens with gases when T is reduced to such low values, 
we need to calculate ju for an arbitrary ideal gas of indistinguishable particles. Let us use the lucky fact 
that the Fermi-Dirac and the Bose-Einstein statistics may be represented with one formula: 

W> = g(g -j/ r±1 . ( 3 - 38 ) 

where (and everywhere in the balance of this section) the top sign stands for fermions and the lower one 
is for bosons, to discuss the fermionic and bosonic gases on the same breath. 

If we deal with a member of the grand canonical ensemble (Fig. 13), in which ju is externally 
fixed, we may apply Eq. (39) to calculate the average number A^ of particles in volume V. If the volume 
is so large that N» 1, we may use the general state counting rule (13): 



Basic 
equation 
for // 



N = 



In most practical cases, however, the number TV of gas particles is fixed by particle confinement (i.e. the 
gas portion under study is a member of the canonical ensemble - see Fig. 2.6), and hence ju rather than 
A^ should be calculated. Here comes the main trick: if A" is very large, the relative fluctuation of the 
particle number is negligibly small (~ lNN« 1), and the relation between the average values of A^ and 
/j should not depend which of these variables is exactly fixed. Hence, Eq. (39), with ju having the sense 
of the average chemical potential, should be valid even if A" is exactly fixed, so that small fluctuations of 
A^ are replaced with (equally small) fluctuations of ju. Physically (as was already mentioned in Sec. 2.8), 
in this case the role of the //-fixing environment for any gas sub-portion is played by the rest of the gas, 
and Eq. (39) expresses the condition of self-consistency of such mutual particle exchange. 

In this situation, Eq. (39) may be used for calculating the average // as a function of two 
independent parameters: A" (i.e. of the gas density n = NIV) and temperature T. For carrying out this 
calculation, it is convenient to convert the right-hand part of Eq. (39) to an integral over particle's 
energy s(p) = p 12m, so that p = (2ms) , and dp = (mils) ds: 



gVm^ 12 f s V2 ds 



N= gV r m f 



42^{ e ^ )IT ±\ 



(3.40) 



This key result may be presented in two more convenient forms. First, Eq. (40), derived for our current 
(3D, isotropic and parabolic-dispersion) approximation (3), is just particular case of a general relation 

oo 

N = \g(s)(N{s))ds, (3.41) 

o 

where 

g ( £ ) = ^f* (3.42) 
ds 
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is the temperature-independent density of all quantum states of a particle - regardless of whether they 
are occupied or not. Indeed, according to the general Eq. (4), for our simple model (3), 



( \ ^states d 

as as 



AngV d(p 3 ) _ gVm 3 ' 2 
{(inhf 3 F )~ 3(2nhf ds ~ 4ln 2 tf 



gV An p3 



.1/2 



(3.43) 



so that we return to Eq. (39). On the other hand, for some calculations, it is convenient to introduce a 
dimensionless energy variable <^= s/T to express Eq. (40) via a dimensionless integral: 



N = 



gV(mT) 3,2 °j ? l2 d% 



(3.44) 



As a sanity check, in the classical limit (34), the exponent in the denominator of the fraction 
under the integral is much larger than 1 , and Eq. (44) reduces to 



gV(mT) 3 ' 2 ? £ ri d% gV(mT) 



N = 



4l7Z 2 tf 



00 £-1/2 



4ln 2 ti 



.3/2 r 1 a> 



1,2 e ^d%, at -n»T. 



(3.45) 



By the definition of gamma- function r(^), 17 this dimensionless integral is just T(3/2) = and we get 



exp 



42n 2 h : 



14 -N 

Irj gV(mT) vl 4i y 



3/2 



(3.46) 



J 



which is exactly the same result as given by Eq. (34), which has been obtained in a rather different way 
- from the Boltzmann distribution and thermodynamic identities. 

Unfortunately, in the general case of arbitrary ju the integral in Eq. (44) cannot be worked out 
analytically. 18 The best we can do is to use temperature To, defined by Eq. (37), to rewrite Eq. (44) as 



T 



J-J 



£ l2 dt; 



4in 2 { e ^ IT ±\ 



-2/3 



(3.47) 



We may use this relation to calculate ratio 77Tb, and then ratio ju/Tq = (ju/T)x(T/To), as functions of ju/T 
numerically, and then plot the results versus each other, thinking of the former ratio as the argument. 

Figure 1 shows the resulting plot. It shows that at large temperatures, T » To, the chemical 
potential is negative and approaches the classical behavior given by Eq. (46) for both fermions and 
bosons - just as we could expect. For fermions, the reduction of temperature leads to /u changing its sign 
from negative to positive, and then approaching a constant positive value called the Fermi energy, s? « 
7.595 To at T — > 0. On the contrary, the chemical potential of a gas of bosons stays negative, and turns 
into zero at certain critical temperature T c « 3.313 To. Both these limits, which are very important for 
applications, may (and will be :-) explored analytically, but separately for each statistics. 



17 See, e.g.,MAEq. (6.7a). 

18 p or reader's reference only: for the upper sign, the integral in Eq. (40) is a particular form (for s = Vi) of a 
special function called the complete Fermi-Dirac integral F s , while for the lower sign, it is a particular case (for s 
= 3/2) of another special function called the polylogarithm Li s . (In what follows, I will not use these notations.) 
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Fig. 3.1. Chemical potential of an ideal gas of 
TV » 1 indistinguishable quantum particles, 
as a function of temperature (at fixed gas 
density n = NIV, which fixes parameter T 0 <x 
n 3/2 ), for two different quantum statistics. 
The dashed line shows the classical 
approximation (46) valid at T » T 0 . 



T/T n 



Before doing that (in the next two sections), let me show that, rather surprisingly, for any (but 
nonrelativistic!) quantum gas, the product PV expressed in terms of energy, 



PVms. E 



PV = -E. 

3 ' 



(3.48) 



is the same as follows from Eqs. (18) and (19) for the classical gas, and hence does not depend on 
particle's statistics. In order to prove this, it is sufficient to use Eqs. (2.114) and (2.117) for the grand 
thermodynamic potential of each quantum state, which may be conveniently represented by a single 
formula, 



Q,, = +7Un l±e 



(M-e k )/T 



(3.49) 



and sum them over all states k, using the general summation formula (13). The result for the total grand 
potential of a 3D gas with the dispersion law (3) is 



Q = Tr^Jlnfl±e^ 2/2 ™^ 



i 312 00 

Working out this integral by parts, exactly as we did it with the one in Eq. (2.90), we get 

/-> jr 3/2 =° 

2 gVm 



(3.50) 



Q 



s 3,2 ds 2? 



3 V2V/i 3 {e (£ -^ /T ±\ 3 



-\eg 3 (e)(N(e))de. 



(3.51) 



But the last integral is just the total energy E of the gas: 



Ideal 
gas' 
energy 
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gV 



■J 



A7ip 2 dp 



gVm 



3/2 a> 



{iTthf o 2m e MP)-MVT ± 1 ^2 h 3 j e ( £ - M )/T ± x 



f- 



- V2 ds 



= jeg 3 (e)(N(e))de, 



(3.52) 



so that for any temperature and any particle type, Q = -(2/3)7?. But since, from thermodynamics, Q = 
PV, we have Eq. (48) proved. This universal relation will be repeatedly used below. 
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3.3. Degenerate Fermi gas 

The analysis of low-temperature properties of a Fermi gas is very simple in the limit T 
Indeed, in this limit, the Fermi-Dirac distribution (2.1 15) is just a step function: 



for a < ju, 
for /u < s, 



(3.53) 



- see by the bold line in Fig. 2a. Since s = p 2 /2m is isotropic in the momentum space, this means that at T 
= 0, in that space the particles fully occupy all possible quantum states within a sphere (frequently 
called either the Fermi sphere or the Fermi sea) with some radius /?f (Fig. 2b), while all states above the 
sea surface are empty. Such degenerate Fermi gas is a striking manifestation of the Pauli principle: 
though at thermodynamic equilibrium at T = 0 all particles try to lower their energies as much as 
possible, only g of them may occupy each quantum state within the Fermi sphere. As a result, the 
sphere's volume is proportional to the particle number N, or rather to their density n = N/V. 





Fig. 3.2. Representation of the 
Fermi sea: (a) on the energy 
axis and (b) in the momentum 
space. 



Indeed, radius p? may be readily related to the number of particles N using Eq. (40) whose 
integral in this case is just the Fermi sphere volume: 



N- 



gV 



Pf 



^47ip 2 dp 



gVAn 3 
Pf 



{inhf 3 



(3.54) 



Now we can use Eq. (3) to express via N the chemical potential /u (which is this limit, 7 — > 0, bears the 
special name of the Fermi energy £f) 19 : 



s F =ju\ 



Pi 



T=0 



2m 2m 



N 
gV. 



. 2/3 



v 2 j 



7> 7.595 T 0 , 



fi cc \ Fermi 
(p.jjaj energy 



where To is the quantum temperature scale defined by Eq. (36). This formula quantifies the low- 
temperature trend of function ju(T), clearly visible in Fig. 1, and in particular explains the ratio s?IT 
mentioned in Sec. 2. Note also a useful and simple relation, 



N 



s F = 



2 g 3 (e F ) 

which may be obtained immediately from Eqs. (43) and (54). 



(3.55b) 



Note that in the electronic engineering literature, /u is usually called the Fermi level, at any temperature. 
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The total energy of the gas may be (equally easily) calculated from Eq. (52): 

E = ^Wdp = j^—^ = 3 -s F N , (3.56) 

{2xhf {2m V (iTthf 2m 5 5 

showing that the average energy, (s) = E/N, of a particle inside the Fermi sea is equal to 3/5 = 60% of 
that (£f) of the most energetic occupied states, on the Fermi surface. Since, according to the formulas of 
Chapter 1 , at zero temperature H = G = Nju, and F = E, the only thermodynamic variable still to be 
calculated is pressure P. For that, we could use any of thermodynamic relations P = (H - E)IV or P = - 
(8F/8V) T , but it is even easier to use our recent result (48). Together with Eq. (56), it yields 



„ 2E 2 N 

P = = —s F — 

3 V 5 V 



v 125 



1/3 



h 7 



P 0 ~3.035P 0 , where P 0 = nT 0 =^—n 5 '\ (3.57) 



mg 

From here, it is easy to calculate the bulk modulus (reciprocal compressibility), 20 



K = -V 



dP 

~d~v 



N 



= -s F -, (3.58) 

j T 5 V 



which is simpler to measure experimentally. 

Perhaps the most important example 21 of the degenerate Fermi gas are the conduction electrons 
in metals - the electrons that belong to outer shells of the isolated atoms but become common in solid 
metals and can move through the crystal lattice almost freely. Though electrons (which are fermions 
with spin s = Vi and hence the spin degeneracy g = 2s + 1 = 2) are negatively charged, the Coulomb 
interaction of conduction electrons with each other is substantially compensated by the positively 
charged ions of the atomic lattice, so that they follow the simple formulas derived above reasonably 
well. This is especially true for alkali metals (forming Group 1 of the periodic table of elements), whose 
experimentally measured Fermi surfaces are spherical within 1% even within 0.1% for Na. Table 1 lists, 
in particular, the experimental values of the bulk modulus for such metals, together with the values 
given by Eq. (58) using s F calculated from Eq. (55) with the experimental density of conduction 
electrons. Evidently, the agreement is pretty good, taking into account that the simple theory described 
above completely ignores such factors as the Coulomb and exchange interactions of the electrons. This 
agreement implies that, surprisingly, the rigidity of solids (or at least metals) is predominantly due to the 
kinetic energy of conduction electrons, complemented with the Pauli principle, rather than any 
electrostatic interactions - though, to be fair, these interactions are the crucial factor defining the 
equilibrium value of n. Numerical calculations using more accurate approximations (e.g., the density 
functional theory 22 ) that agree with experiment with a few percent accuracy, confirm this conclusion. 23 



20 See, e.g., CM Sec. 8.3. 

21 Recently, degenerate gases (with e F ~ 5T) have been formed of weakly interacting Fermi atoms as well - see, 
e.g., K. Aikawa et ah, Phys. Rev. Lett. 112, 010404 (2014) and references therein. 

22 See, e.g., QM Sec. 8.4. 

23 Note also a huge difference between the very high bulk modulus of metals (K~ 10 U Pa) and its very low values 
in usual gases (for them, at ambient conditions, K ~10 5 Pa). About 4 orders of magnitude of this difference in due 
to that in particles, density NIV, but the balance is due to the electron gas' degeneracy. Indeed, in an ideal classical 
gas, K = P = NT/V, so that factor (2/3)s F in Eq. (58), of the order of a few eV in metals, should be compared with 
factor T~25 meV in the atomic gases at room temperature. 
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Table 3.1. Experimental 



and theoretical parameters of electron's Fermi sea in some alkali metals 24 



Metal 


sf (eV) 
Eq. (55) 


^(GPa) 
Eq. (58) 


K (GPa) 
experiment 


^(mcal/mole-K 2 ) 
Eq. (69) 


^(mcal/mole-K 2 ) 
experiment 


Na 


3.24 


923 


642 


0.26 


0.35 


K 


2.12 


319 


281 


0.40 


0.47 


Rb 


1.85 


230 


192 


0.46 


0.58 


Cs 


1.59 


154 


143 


0.53 


0.77 



Now looking at the values of sf listed in the table, note that room temperatures (Ik ~ 300 K) 
correspond to T ~ 25 meV. As a result, virtually all experiments with metals, at least in their solid or 
liquid form, are performed in the limit T « According to Eq. (39), at such temperatures the 
occupancy step described by the Fermi-Dirac distribution has a finite but relatively small width ~ T - 
see the dashed line in Fig. 2a. Calculations in this case are much facilitated by the so-called Sommerfeld 
expansion formula 25 for integrals like (40) and (52): 



I(T) - ](p(s){N(s))ds * \cp{s)ds + —T 2 MM) 



djU 



at T « ju, 



(3.59) 



where <p(s) is an arbitrary function that is sufficiently smooth at s = ju and integrable at s = 0. In order to 
prove this formula, let us introduce another function 



f(s) = J (p{s')ds', so that (p{s) ■ 



df(e) 
ds 



(3.60) 



and work out the integral I(T) by parts 
df{s) 

E=0 



I(T) = \^M(N(e))de= J (N(s))df = [(N(e))fl£ - \f(e)d(N(e)) = jf(e) 



8(N(4 



6=0 



ds 



d£.(3.6\) 



As evident from Eq. (39) and/or Fig. 2a, at T « ju, function (-d(N(s))lds) approaches zero for all 
energies, besides a narrow peak, of unit area, at s~ /u. Hence, if we expand function j{s) in the Taylor 
series near this point, just a few leading terms of the expansion should give us a good approximation: 

1 d 2 f / y 1 ' *" U 



o 

M 

= \(p(s')ds'\ 



f^+ d i\s=, 



(e-fi)+ 



2 ds 1 



#1' 



8{N{s} 



ds 



ds + <p(ju)j (e-ju\- 



ds 



ds + 



ds 

1 d(p{/u) 

2 dju 



#1 



(3.62) 



ds 



ds. 



Sommerfeld 
expansion 



24 Data from N. Ashcroft and N. Mermin, Solid State Physics, W. B. Sounders, 1976. 

25 Named after A. Sommerfeld, who was the first (in 1927) to apply the then-emerging quantum mechanics to 
degenerate Fermi gases, in particular to electron in metals, and may be credited for most of the results discussed in 
this section. 
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In the last form of this relation, the first integral over s equals (n(s= 0)) - (n(s= oo) = 1, the second one 
vanishes (because the function under it is asymmetric about point s = ju), and only the last one needs to 
be dealt with explicitly, by working it out by parts and then using a table integral: 26 



ds 



ds~T 



1 



V e 



d% = AT 2 | 



[e* +1 



AT 1 



12 



(3.63) 



Being plugged into Eq. (62), this result proves the Sommerfeld formula (59). 

The last preparatory step we need is to take into account a possible small difference (as we will 
see below, also proportional to J 2 ) between the temperature-dependent chemical potential ju{T) and the 
Fermi energy defined as sp = ju(0), in the largest (first) term in the right-hand part of Eq. (62), to write 



o " 



= I(0) + ( M -sMm) + ^T 2 
dju 6 dju 



(3.64) 



Now, applying this formula to Eq. (42) and the last form of Eq. (52), we get the following results 
(which are valid for any dispersion law £(p) and even any dimensionality of the gas): 



N(T) = N(0) + {ju-s F )g(ji) + ^-T 



d/u 



d 



E(T) = E(0) + {ju-s F )/jg(ji) + — T 2 —\jug(ju)]. 

6 d/J 



(3.65) 



(3.66) 



However, the number of particles does not change with temperature, N(T) = N(0), so that Eq. (65) gives 
an equation for finding the temperature-induced change of //: 



H-e F = 



71 



1 dg(ju) 



-T 2 - 
6 g(ju) d/j 



(3.67) 



Note that the change is quadratic in T and negative, in agreement with the numerical results shown in 
Fig. la. Plugging this expression (which is only valid when the magnitude of the change is much smaller 
than £p) into Eq. (66), we finally get the finite-temperature correction to energy: 



E(T)-E(Q) = ?-g( M )T 2 , 
6 



(3.68) 



Low-T 
heat 
capacity 



where within the accuracy of our approximation, ju may be replaced with s ¥ . (Due to the universal 
relation (48), Eq. (68) also gives the temperature correction to pressure.) Now we may use Eq. (68) to 
calculate the heat capacity of the degenerate Fermi gas: 



(3.69) 



According to Eq. (55b), in the particular case of a 3D gas with the isotropic and parabolic 
dispersion law (3), Eq. (69) reduces to 




26 See, e.g., MA Eqs. (6.8c) and (2.12b), with n = 1. 
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7T 2 N 7T 2 T 

y = - , i.e. C v =— N— «N. (3.70) 

2 s F 2 s F 

This important result deserves a discussion. First, note that within the range of validity of the 
Sommerfeld approximation (T « sp), the specific heat of the degenerate gas is much smaller than that 
of the classical gas, even without internal degrees of freedom, Cy = (3/2)N- see Eq. (19). The reason for 
such a small heat capacity is that particles deep inside the Fermi sea cannot pick up thermal excitations 
with available energies of the order of T « sp, because all states around them are already occupied. 
The only particles (or rather quantum states) that may be excited with such small energies are those at 
the very Fermi surface, more exactly within a surface layer of thickness As ~ T « sp, and Eq. (69) 
presents a very vivid expression of this fact. 

The second important feature of Eqs. (69)-(70) is the linear dependence of the heat capacity on 
temperature, which decreases with a reduction of T much slower than that of crystal vibrations - see Eq. 
(2.99) and it discussion. This means that in metals the specific heat at temperatures T « Tn is 
dominated by the conduction electrons. Indeed, experiments confirm not only the linear dependence 
(70) of the specific heat, 27 but also the values of the proportionality coefficient y = Cy/Tfor cases when 
Sp can be calculated independently, for example for alkali metals - see the right two columns of Table 1. 
More typically, Eq. (69) is used for the experimental measurement of the density of states on the Fermi 
surface, g(sp) - the factor which participates in many theoretical results, in particular in transport 
properties of degenerate Fermi gases (see Chapter 6 below). 



3.4. Bose-Einstein condensation 

Now let us explore what happens at cooling of an ideal gas of bosons. Figure 3a shows on a 
more appropriate, log-log scale, the same plot as Fig. lb, i.e. the numerical solution of Eq. (47) with the 
appropriate (negative) sign in the denominator. One can see that that the chemical potential /j indeed 
tends to zero at some finite "critical temperature" T c . This temperature may be found by taking ju = 0 in 
Eq. (47), which is then reduced to a table integral: 28 



T - T 


i n v2 df 

V2VI /-l 


-2/3 

- T 


1 (3*] f3Y 

r £ 


-2/3 

* 3.313 T 0 , 



Critical 
temperature 



the result explaining the T c /T 0 ratio mentioned in Sec. 2. 

Hence we must have a good look at the temperature interval 0 < T < T c , which may look rather 
mysterious. Indeed, within this range, chemical potential ju cannot be either negative or zero, because 
then Eq. (41) would give a value of N fewer than the number of particles we actually have. On the other 
hand, ju cannot be positive either, because integral (41) would diverge at £— > ju due to the divergence of 
(N(s)) - see, e.g., Fig. 2.15. The only possible resolution of the paradox, suggested by A. Einstein, is as 
follows: at T < T c , the chemical potential of each particle still equals exactly zero, but a certain number 



27 Solids, with their low thermal expansion coefficients, present a virtually fixed-volume confinement for the 
electron gas, so that the specific heat measured at ambient conditions may be legitimately compared with 
calculated c v . 

28 See, e.g., MA Eqs. (6.8b) and (6.6c) with s = 3/2. 



Chapter 3 



Page 16 of 30 



Essential Graduate Physics 



SM: Statistical Mechanics 



(No of AO of them are in the ground state (with s = p 12m = 0), forming the so-called Bose-Einstein 
condensate, very frequently referred to as BEC. 



(a) 



(b) 
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Fig. 3.3. The Bose-Einstein condensation: 
(a) chemical potential of the gas and (b) its 
pressure, as functions of temperature. The 
dashed line corresponds to the classical gas. 



— 1/2 

Since the condensate particles do not contribute to Eq. (41) (because of the factor s = 0), their 
number No may be calculated by using Eq. (44), with ju = 0, to find the number (N-No) of particles still 
remaining in the gas, i.e. having energy s > 0: 



g V(mTf n U m d% 
> 3 J A 



N-N n 



4in 2 K % e h -1 

This result is even simpler than it may look. Indeed, let us write it for case T= T c , when iVo = 0: 29 

g V(mT c y i2 ^ V2 d^ 



(3.72) 



N 



3 J I 1 

o e b -1 



(3.73) 



Since the dimensionless integrals in both equations are similar, we may just divide them, getting an 
extremely simple and elegant result: 



N 



3/2 



, T 

\ 1 c J 



so that N n = N 



1 



f T \ 3 ' 2 



K T cJ 



at T < T 



(3.74) 



Figure 4 shows result (74) together with one of the first sets of experimental data for the Bose- 
Einstein condensation of dilute gases of neutral atoms. Taking into account the finite number of particles 
in the experiment, and the fact that the atoms have been hold together by confinement in a potential well 



29 This is, of course, just another form of Eq. (71). 



Chapter 3 



Page 17 of 30 



Essential Graduate Physics 



SM: Statistical Mechanics 



with not quite vertical walls, i.e. were not completely free to move inside it, the agreement is 
surprisingly good. 



N 



0.0 0.5 1.0 1.5 




1 1 1 r- 



Fig. 3.4. Total number N of trapped 87 Rb 
atoms (inset) and their ground-state fraction 
No/N, as functions of the ratio TIT C , as 
measured by J. Ensher et al., Phys. Rev. Lett. 
11, 4984 (1996). In this experiment, T c was 
as low as 0.28x1 0" 6 K. The solid line shows 
the simple theoretical dependence N(T), 
given by Eq. (74), while other lines 
correspond to more complex theories taking 
into account the trapping potential and the 
finite number N of trapped atoms. © AIP. 



Now let us explore what happens at the critical temperature and below it with other gas 
parameters. Equation (52) with the appropriate (lower) sign shows that approaching this point from 
higher temperatures, gas energy and hence its pressure do not vanish (Fig. 3b). Indeed, at T= T c (where 
/u = 0), that equation yields 30 



E(T c ) = gV 



3/2^5/2 



o e 



4 



1 



3 ^(5/2) 
2 ^(3/2) 



NT C » 0.7701M;, (3.75) 



so that using the universal relation (48), we get a pressure value, 



P{T J . IBM = £^ia Tr - 0.5134 - 1.70^, 
c 3 V ^(3/2) V V 0 



(3.76) 



which is somewhat lower than, but comparable to P(0) for the fermions - cf. Eq. (57). Now we can use 
the same Eq. (52), also with ju = 0, to calculate the energy of the gas at T< T c , 



E(T) = gV 



m 



3/2^5/2 oo e 3/2 . 

J27r 2 h 3 - -1 



(3.77) 



Comparing this relation with the first form of Eq. (75), which features the same integral, we 
immediately get one more simple temperature dependence: 



E(T) = E(T c ) 



( T \ 5 ' 2 



K T cJ 



at T < T 



BEC: 
(3.78) energy 
below T c 



From the universal relation (44), we immediately see that pressure follows the same dependence: 



30 For the involved dimensionless integral see, e.g., MA Eqs. (6.8b) and (6.6c) with s = 5/2. 
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BEC: 
pressure 
below T c 







5/2 


P(T) = P(T e ) 




, at T < T c . 


K T cJ 





(3.79) 



This temperature dependence of pressure is shown with the blue line in Fig. 3b. The plot shows that for 
all temperatures (both below and above T c ) the pressure of bosonic gas is below that of the classical gas 
of the same density. Note also that since, according to Eqs. (57) and (76), P(T C ) cc P 0 cc V J '\ while, 
according to Eqs. (37) and (71), T c cc Tq cc V~ m , pressure (79) does not depend on volume at all! The 
physics of this result (that is valid at T < T c only) is that as we decrease the volume at fixed total number 
of particles, more and more of them go to the condensate, decreasing the number (N- No) of particles in 
the gas phase, but not changing its pressure. Such behavior is very typical for phase transitions - see, in 
particular, the next chapter. 

The last thermodynamic variable of major interest is the heat capacity, because it may be readily 
measured in many systems. For temperatures T< T c , it may be easily calculated from Eq. (78): 



BE 
dT 



5 T 



3/2 



J NY 



(3.80) 



so that below T c , the capacity increases, at the critical temperature reaching the value, 



Cy(T c ) = 



5 E(T c ) 
2 T 



* 1.925 N, 



(3.81) 



which is approximately 28% above that (3N/2) of the classical gas - in both cases ignoring the possible 
contributions from the internal degrees of freedom. The analysis for T > T c is a little bit more 
cumbersome, because differentiating E over temperature - say, using Eq. (52) - one should also take into 
account the temperature dependence of /u that follows from Eq. (40) - see also Fig. lb. However, the 
most important feature of the result may be predicted without the calculation (which is being left for 
reader's exercise). Since at T » T c the heat capacity has to approach the classical value, it must 
decrease at T> T c , thus forming a sharp maximum (a "cusp") at the critical point T= T c - see Fig. 5. 
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Fig. 3.5. Temperature dependences of the heat 
capacity of an ideal Bose-Einstein gas, 
calculated from Eqs. (52) and (40) for T > T c , 
and from Eq. (80) for T< T c . 



Such a cusp is good indication of the Bose-Einstein condensation in virtually any experimental 
system, especially because inter-particle interactions (unaccounted for in our simple discussion) 
typically make this feature even more substantial, turning it into a weak (logarithmic) singularity. 
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Historically, such a singularity (called the A-point because of the characteristic shape of the Cy(T) 
dependence) was the first noticed, though not immediately understood sign of the Bose-Einstein 
condensation, observed in 1931 by W. Keesom and K. Clusius in liquid 4 He at T= T c « 2.17 K . Other 
milestones of the Bose-Einstein condensation studies include: 

- the experimental discovery of superconductivity in metals, by H. Kamerlingh-Onnes in 1911; 

- the development of the Bose-Einstein statistics implying the condensation, in 1924-1925; 

- the discovery of superfluidity in liquid 4 He by P. Kapitza and (independently) by J. Allen and 
D. Misener in 1937, and its explanation as a result of the Bose-Einstein condensation by F. and H. 
Londons and L. Titza, with further elaborations by L. Landau (all in 1938); 

- the explanation of superconductivity as the result of formation of Cooper pairs of electrons, 
with an integer total spin, with the simultaneous Bose-Einstein condensation of such effective bosons, 
by J. Bardeen, L. Cooper, and J. Schrieffer in 1957; 

-the discovery of superfluidity of two different phases of 3 He, due to the similar Bose-Einstein 
condensation of pairs of its fermion atoms, by D. Lee, D. Osheroff, and R. Richardson in 1972; 

- the first observation of the Bose-Einstein condensation in dilute gases ( 87 Ru by E. Cornell, C. 
Wieman et al. and 23 Na by W. Ketterle et al.) in 1995. 

The importance of the last achievement (and of the continuing intensive work in this direction) 
stems from the fact that in contrast to other Bose-Einstein condensates, in dilute gases (with the typical 
density n as low as ~ 10 1 cm" 3 ) the particles interact very weakly, and hence many experimental results 
are very close to the simple theory described above and its straightforward elaborations - see, e.g., Fig. 
4. On the other hand, the importance of prior implementations of the Bose-Einstein condensates, which 
involve more complex and challenging physics, should not be underestimated - as it sometimes is. 

The most important feature of any Bose-Einstein condensate is that all No condensed particles 
are in the same quantum state, and hence are described by exactly the same wavefunction. This 
wavefunction is substantially less "feeble" than that of a single particle - in the following sense. In the 
second quantization language, 31 the well-known Heisenberg's uncertainty relation may be rewritten for 
the creation/annihilation operators; in particular, for bosons, 



Since a and a' are quantum-mechanical operators of the complex amplitude a = ^4exp{z^} and its 
complex conjugate a* = Aexp{-i(p}, where A and cp are real amplitude and phase of the wavefunction. 
Equation (82) yields the following approximate uncertainty relation (strict in the limit Sep « 1) between 
the number of particles N = AA* and phase q> 



This means that a condensate of N » 1 bosons may be in a state with both phase and amplitude 
of the wavefunction behaving virtually as c-numbers, with negligible relative uncertainties: 5N « N, 
S<p« 1 . Moreover, such states are much less susceptible to perturbations by experimental instruments. 
For example, the supercurrent Is carried along a superconducting wires by a coherent Bose-Einstein 



31 See, e.g., QM Sec. 8.3. 



d,d ] = 1 . 



(3.82) 



5N8(p>M2. 



(3.83) 
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condensate of Cooper pairs may be as high as hundreds of amperes. As a result, the "strange" behaviors 
predicted by the quantum mechanics are not averaged out as in the usual particle ensembles (see, e.g., 
the discussion of the density matrix in Sec. 2.1), but may be directly revealed in macroscopic, 
measurable behaviors of the condensate. 

For example, density j s of the supercurrent may be described by the same formula as the usual 
probability current density of a single particle, 32 multiplied by the Cooper pair density n and the electric 
charge q = -2e of a single pair: 



h 

qn — 
m 



V^-|A|, (3.84) 
h 



where A is the vector-potential of the (electro)magnetic field. If a superconducting wire is not extremely 
thin, current flow does not penetrate its interior, 33 so that j s may be taken for zero. As a result, the 
integral of Eq. (84), taken along a contour inside a closed wire loop yields 

^|A- dr = A(p = 2mn, (3.85) 

n c 

where m is an integer. But, according to electrodynamics, the integral participating in this equation is 
nothing more than flux O of the magnetic field 3 piercing the wire loop area A. Thus we immediately 
arrive at the famous magnetic flux quantization effect 

®=\3 n d 2 r = m® 0 , O 0 =^ = A ~ 2.07x10 ~ 15 Wb , (3.86) 

which was theoretically predicted in 1950 and experimentally observed in 1961. Most fantastically, this 
effect holds true even in very large loops, sustained by the Bose-Einstein condensate of Cooper pairs, 
"coherent over miles of dirty lead wire", citing J. Bardeen's famous expression. 

Other prominent examples of such macroscopic quantum effects in Bose-Einstein condensates 
include not only the superfluidity and superconductivity as such, but also the Josephson effect, 
quantized Abrikosov vortices, etc. Some of these effects are briefly discussed in the EM and QM parts 
of this lecture series. 34 



3.5. Gases of weakly interacting particles 

Now let us discuss the weak particle interaction effects on macroscopic properties of their gas. 
(Unfortunately, I will have time to do that only for a brief discussion of these effects in classical gases of 
indistinguishable particles. 35 ) 



32 See, e.g., QM Eq. (3.28). 

33 This is the Meissner-Ochsenfeld (or just "Meissner") effect which may be also readily explained using Eq. (84), 
combined with the Maxwell equations - see, e.g., EM Sec. 6.3. 

34 See QM Sec. 2.3, and EM Sees. 6.3 and 6.4. 

35 A concise discussion of weak interactions in quantum gases may be found, for example, in Chapter 10 of K. 
Huang, Statistical Mechanics, 2 nd ed., Wiley, 2003. 
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In most cases of interest, particle interaction may be described by certain potential energy U, so 
that the total energy is 



*=i 2m 



(3.87) 



where is the position of k th particle's center. Let us see how far would the statistical physics allow us 
to proceed for an arbitrary potential U. For iV» 1, at the calculation of the Gibbs statistical sum (2.59), 
we may perform the usual transfer from the summation over all quantum states of the system to 
integration over the 6iV-dimensional space, with the correct Boltzmann counting: 



^ -E IT 1 

Z = 2^e m 



g N f™J ^ PI 
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Jexp 
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\d 2 p v ..d 2 p t 



7 N J 



exp\- U(ri '-- XN) \d\...dX 



(3.88) 



But according to Eq. (14), the first operand in the last product is the statistical sum of an ideal gas (with 
the same g, N, V, and J), so that we may use Eq. (2.63) to write 



\\d'r....d 

yN J 1 



r N e 



-UIT 



= F^-T\n 



l + ^ld\...d\{e- U "-l) 



(3.89) 



where F idea i is the free energy of the ideal gas (i.e. the same gas but with U= 0), given by Eq. (16). 

I believe that Eq. (89) is a very convincing demonstration of the enormous power of the 
statistical physics. Instead of trying to solve an impossibly complex problem of classical dynamics of TV 
» 1 (think of N~ 10 23 ) interacting particles, and calculating appropriate ensemble averages later on, the 
Gibbs approach reduces finding the free energy (and then, from thermodynamic relations, all other 
thermodynamic variables) to the calculation of just one integral in its right-hand part of Eq. (89). Still, 
this integral is 3iV-dimensional and may be worked out analytically only if particle interaction is weak 
in some sense. Indeed, the last form of Eq. (89) makes its especially evident that if C/ — >• 0 everywhere, 
the term in parentheses under the integral vanishes, and so does the integral itself, and hence the 
addition to Fid ea i- 

Now let us see what would this integral yield for the simplest, short-range interactions, in which 
potential U is substantial only when the mutual distance r#- = r, - r,- between particle centers is smaller 
than certain value 2ro, where ro may be interpreted as the particle size scale. If the gas in sufficiently 
dilute, so that the particle radius ro is much smaller than the average distance r\ between the particles, 
the integral in Eq. (89) is of the order of (2ro) N , i.e. much smaller than r A 3N ~ V 1 *. Then we may expand 
the logarithm in Eq. (89) into the Taylor series with respect to the small second term in the square 
brackets, and keep just the first term of the series: 



-LjA-.«*'4-™--i). 



(3.90) 



Even more importantly, if the gas density is so low that the chances for 3 or more particles to 
come close to each other and interact (collide) are very small, pair collisions are the most important. In 
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this case, we may recast the integral in Eq. (90) as a sum of N(N - l)/2 » N 12 similar terms describing 
such pair interactions, each of the type 



'N-2 



1 \d\d\ 



(3.91) 



It is convenient to think about r^- as the radius-vector of particle k in the reference frame with the origin 
placed at particle k' - see Fig. 6a. 



(a) 




particle k 



particle k 




Fig. 3.6. Fhe definition of 
interparticle distance vectors 
at their (a) pair and (b) triple 
interactions. 



Then it is clear that in Eq. (91), we may first calculate the integral over rv while keeping the 
distance vector tkk; and hence U(r^), constant, getting one more factor V. Moreover, since all particle 
pairs are similar, in the remaining integral over r«f we may drop the radius-vector index, so that Eq. (90) 
becomes 



F = F 



ideal 



T N z 



(3.92) 



where B(T), called the second virial coefficient, 7 ' 6 has an especially simple form for spherically - 
symmetric interactions: 



Second 
virial 
coefficient 



B(T) . I j (l - e~ U ^ IT )d V -> ij4^ 2 4 - e~ U ^ IT ). 



(3.93) 



From Eq. (92), and the second of the thermodynamic relations (1.35), we already can already tell 
something important about the equation of state: 



dV) Ti 



N 2 T 

= P**+-^rB(T) = T 



N 



N 2 



— + B(T) — 
V V 



(3.94) 



We see that at a fixed gas density n = NIV, the pair interaction creates additional pressure, proportional 

2 2 

to (NIV) = n and a function of temperature, B(T)T. 



36 Term "virial", from Latin viris (meaning "force"), was introduced to molecular physics by R. Clausius. The 
motivation for adjective "second" for B(T) is evident from the last form of Eq. (94), with the "first virial 
coefficient", standing before the Af/F ratio and sometimes denoted A(T), equal to 1 - see also Eq. (100) below. 
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Let us calculate B{T) for a couple of simple models of particle interactions. Solid line in Fig. 7 
shows (schematically) a typical form of the interaction potential between electrically neutral molecules 
with zero spontaneous electric dipole momentum. 




Fig. 3.7. Pair interaction of particles. 
Solid line: a typical interaction potential; 
dashed line: its hardball model (95); 
dash-dotted line: the improved model 
(97) - schematically. The inset illustrates 
the idea of the hardball model. 



At large distances the interaction of particles that do not their own permanent electrical dipole 
moment p, is dominated by the attraction (the so-called London dispersion force) between correlated 
components of the spontaneously induced dipole moments, giving U(r) — > r' 6 at r — > oo. 37 At closer 
distances the potential is always repulsive (U > 0) and growing very fast at r — > 0, but its quantitative 
form is very specific for each particular molecule. 38 The crudest description of such repulsion is given 
by the so-called hardball model: 

f + oo, for 0 < r < 2r n . 
U{r) = \ ' 0 (3.95) 

[0, tor 2r 0 < r < oo, 

- see the dashed line and inset in Fig. 7. According to Eq. (93), in this model the second virial coefficient 
is temperature-independent: 

2r, 



B(T) = b = \ \4w 2 dr = ^{2r 0 f, (3.96) 



2 0 



(and is 4 times larger than the hardball volume Vo = (4^/3)r 0 3 ), so that the equation of state (94) still 
gives a linear dependence of pressure on temperature. 



37 Indeed, the independent fluctuation- induced components p(f) and p \t) of dipole moments of two particles have 
random mutual orientation, so that the time average of their interaction energy, proportional to r" 3 , vanishes. 
However, the electric field 3 of each dipole p, proportional as r" 3 , induces a correlated component of p ', also 
proportional to r" 3 , giving a potential energy of their interaction, proportional to p •<£ cc r" 6 , with a non-vanishing 
time average. A detailed theory of this effect, closely related to the Casimir effect in quantum mechanics (see, 
e.g., QM Sec. 9.1) may be found, e.g., in Sees. 80-82 of E. Lifshitz and L. Pitaevskii, Statistical Mechanics, pt. 2, 
Pergamon, 1980. 

38 Note that the particular form of the first term in the approximation U(r) = a/r n - b/r 6 , frequently met is 
undergraduate textbooks, is just a trick to make the calculation of the equilibrium distance between the particles 
by differentiation simpler, and lacks physical justification. 
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A correction to this result may be obtained by the following approximate account of the long- 
range attraction (see the dash-dotted line in Fig. 7): 39 



U(r) = 




for 0 < r < 2r 0 , 
with \U\ « T, for 2r n < r < oo , 



(3.97) 



which is sometimes called the hard core model. Then Eq. (93) yields: 

B (T) = b + -]4nr 2 dr l ^- = b--, with a = 2fu]r 2 dr\U(r)\ 



Pressure's 
expansion 
in N/V 



(3.98) 



2r 0 2r 0 

In this model, the equation of state (94) acquires a temperature-independent term: 

P = T 
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(3.99) 



Still, the correction to the ideal-gas pressure is proportional to (N/V) , and has to be relatively 
small for Eq. (99) to be valid, so that the right-hand part of Eq. (99) may be considered as the sum of 
two leading terms in the general expansion of P into the Taylor series in low density n = N/V of the gas: 



P = T 


N (N*^ 
— + B(T) — 
V \Vj 


2 (N^ 
V v J 
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+ ... 


5 



(3.100) 



where C(T) is called the third virial coefficient. It is natural to ask how can we calculate C(T) and the 
higher virial coefficients. 

Generally, this may be done just by a careful analysis of Eq. (90), 40 but I would like to use this 
occasion to demonstrate a different, very interesting approach, called the cluster expansion method, 41 
which allows to streamline such calculations. Let us apply to our system, with energy (87), the grand 
canonical distribution. (Just as in Sec. 2, we may argue that if the average number (N) of particles in a 
member of a grand canonical ensemble, with fixed ju and T, is much larger than 1, the relative 
fluctuations of that number are small, so that all its thermodynamic properties should be similar to those 
when N is exactly fixed - as it is assumed when applying the Gibbs distribution valid for the canonical 
ensemble.) For our case, the grand canonical distribution, Eq. (2.109), may be recast as 



N=0 



N „2 

^=zf+^i 



(3.101) 



39 The strong inequality between U and T in this model is necessary not only to make calculations simpler. A 
deeper reason is that if (-Umin) becomes comparable with, or larger than T, particles may become trapped in the 
potential well formed by this potential, forming a different phase - a liquid or a solid. In such phases, the 
probability to find more than two particles interacting simultaneously is high, so that approximation (92), on 
which all our further results are based, becomes invalid. 

40 L. Boltzmann has used that way to calculate the 3 rd and 4 th virial coefficients for the hardball model - as much 
as can be done analytically. 

41 This method was developed in 1937-38 by J. Mayer and his collaborators for a classical gas, and was 
generalized to quantum systems in 1938 by B. Kahn and G. Uhlenbeck. 
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(Notice that here, as always in the grand canonical distribution, TV means a particular rather than average 
number of particles.) Now, let us try to forget for a second that in real systems of interest the number of 
particles is extremely large, and start to calculate, one by one, the first terms Zv. 

In the term with N = 0, both contributions to E m ^ vanish, and so does juN/T, so that Z 0 = 1 . In 

the next term, with N = 1, the interaction term vanishes, so that E m ^ is reduced to kinetic energy of one 
particle, giving 

Making the usual transition from summation to integration, we may write 

Z,=Z/,, where Z = e M/T -^- f exp] — —\d 3 p, and L=\. (3.103) 

{2nhf S 1 2mT\ 



This is the same simple (Gaussian) integral as in Eq. (6), giving 

[ml \ 

(3.104) 
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Now let us explore the next term, with TV = 2, which describes, in particular, pair interactions U= 
U(r), with r = r - r». Due to the particle indistinguishability, this term needs the "correct Boltzmann 
counting" factor 1/2! - cf. Eqs. (12) and (88): 



1 ( f 2 2 1 ^ 

_^2/y/rj_ V _J Pk Pk> [ -U(r)/T 
2mT 2mT 



2!«- 



(3.105) 



Since U is coordinate-dependent, here the transfer from summation to integration should be done more 
carefully than in the first term - cf. Eqs. (24) and (88): 



Z 2 = e 2 ^iMUexp -^-L/^fexp -^L/V'x-fe-^Vr. (3.106) 
2!(2^z) 6j 1 2mT\ ^ J F { 2mT J V ] 

Comparing this expression with the definition of parameter Z in Eq. (103), we get 

Z 2 =— 1 2 , where I 2 ^-\e~ U(r)/T d i r . (3.107) 



,2 

21 z ' V 
Acting absolutely similarly, for the third term of the grand canonical sum we may get 



z,= 



|i/„ /..JyJ.^'TVrtfV, (3.108) 



where r ' and r " are the vectors characterizing the mutual positions of 3 particles - see Fig. 6b. 

These result may be extended by induction to an arbitrary N. Plugging the expression for Z N into 
Eq. (101) and recalling that Q = - PV, we get the equation of state of the gas in the form 
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Cluster 
expansion 
of pressure 



Cluster 
expansion 
of (N) 



P = — In 1 + Z/, + — 7, + — L +. 
V { 2! 2 3! 



(3.109) 



As a sanity check, at U = 0, all integrals In are obviously equal to 1, the expression under the 
logarithm in just the Taylor expansion of e z , giving P = TZ/V, and Q = -PV = -TZ. In this case, 
according to the last of Eqs. (1.62), the average number of particles of particles in the system is (N) = - 
(dCl/dju)T,v = Z, where I have used the fact that since Z <x exp{///7}, dZ/djU = Z/T. Thus, we have happily 
recovered the equation of state of the ideal gas. 42 

Returning to the general case of nonvanishing interactions, let us assume that the logarithm in 
Eq. (109) may be also presented as a Taylor series in Z: 

(3.110) 

(The lower limit of the sum reflects the fact that according to Eq. (109), at Z = 0, P = (T/V) lnl = 0.) 
According to Eq, (1.60), this expansion corresponds to the grand potential 




CO J 

n = -pv = -TY^z'. 



Again using the last of Eqs. (1.62), we get 




(3.111) 



(3.112) 



This equation, for given (AO, may be used to find Z and hence for the calculation of the equation 
of state from Eq. (110). The only remaining conceptual action item is to express coefficients Ji via the 
integrals I participating in expansion (109). This may be done using the well-known Taylor expansion 
of the logarithm function, 43 



cc pi 

ln(l + ^) = X(-l)' +li - 
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Using it together with Eq. (109), we get a Taylor series in Z, starting as 
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(3.113) 



(3.114) 



Comparing this expression with Eq. (110), we see that 



42 Actually, the fact in that case Z = (AO could have been noted earlier by comparing Eq. (104) with Eq. (39). 

43 Looking at Eq. (109), one may think that since £ = Z + Z 2 / 2 /2 +. . . is of the order of at least Z ~ (AO » 1, the 
expansion (113), which converges only if ||J < 1, is illegitimate. However, the expansion is justified by its result 
(114), in which the n-th term is of the order of (N) n (Vo/V) n ~ ln\, so that the series does converge if the gas density 
is sufficiently low: (N)/V« l/Vo, i.e. r A » r 0 . This is the very beauty of the cluster expansion, whose few first 
terms present a good approximation even for a gas with (A) » 1 particles. 
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/,=(/,-l)-3(/ 2 -l) 
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(3.115) 
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where r'" = r' — r"- see Fig. 6b. The expression of Ji, describing the pair interactions of particles, is 
(within a numerical factor) equal to the second virial coefficient B(T) - see Eq. (93). As a reminder, the 
subtraction of 1 from integral h in the second of Eqs. (115) makes the contribution of each elementary 
3D volume d r into integral J% nonvanishing only if at this r two particles interact (U ^ 0). Very 
similarly, in the last of Eqs. (115), the subtraction of three pair-interaction terms from (7 3 -1) makes the 
contribution from elementary 6D volume d 3 r'd 3 r" into integral J3 finite only if at that mutual location of 
particles all three of them interact simultaneously. 

In order to illustrate the cluster expansion method at work, let us eliminate factor Z from the 
system of equations (110) and (112), keeping (for the sake of simplicity) the terms up to 0(Z 3 ) only, as 
has been done in Eq. (114): 
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Dividing these two expressions, we get a result, 
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(3.116) 
(3.117) 

(3.118) 



1 J 



which is accurate with to terms 0(Z ). In this approximation, we may use Eq. (117), solved for Z with 
the same accuracy: 



N 



(3.119) 



Plugging this expression into Eq. (118), we get expansion (100) with 
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(3.120) 



The first of these relations, combined with the first two of Eqs. (115), yields, for the 2 nd virial 
coefficient, the same Eq. (93) that was obtained from the Gibbs distribution, while the second one 
allows us to calculate the 3 rd virial coefficient C(T). (Let me leave the calculation of J3 and C(T), for the 
hardball model, for the reader's exercise.) Evidently, a more accurate expansion of Eqs. (110), (112), 
and (114) may be used to calculate an arbitrary virial coefficient, though starting from the 5 
coefficient, such calculations may be completed only numerically even in the simplest hardball model 



th 



2™ and 3 ra 
virial 

coefficients 



Chapter 3 



Page 28 of 30 



Essential Graduate Physics 



SM: Statistical Mechanics 



3.6. Exercise problems 

3.1 . Use the Maxwell distribution for an alternative (statistical) calculation of the mechanical 
work performed (per cycle) by the Maxwell-Demon heat engine discussed in Sec. 2.3. 

Hint: You may assume the simplest geometry of the engine - see Fig. 2.4. 



32. Use the Maxwell distribution to find the damping coefficient 
r/ P = - dP/du, where P is pressure excerted by an ideal classical gas on a 
piston moving with very low velocity u, in the simplest geometry shown 
in Fig. on the right, assuming that collisions of gas particles with the 
piston are elastic. 



A 



3.3 . Prove that Eq. (3.22) of the lecture notes, 

AS = N infUZl + N \ n Yl±Xl_ 

v x v 2 

derived for the change of entropy at mixing of two ideal classical gases of completely distinguishable 
particles (that had equal densities and temperatures T before mixing) is also valid if particles in each 
of the initial volumes are identical to each other, but different from those in the counterpart sub-volume. 
Assume that masses of all the particles are equal. 



3.4 . Calculate the basic thermodynamic characteristics, including all relevant thermodynamic 
potentials, specific heat, and the surface tension cr = (8F/dA) TiN (where A is the system area), for an 
ideal 2D electron gas with given areal density n = NIA at: 

(i) T=0, and 

(ii) low temperatures (to the first nonvanishing order in T/s F « 1). 



3.5 . Find the free carrier density in a semiconductor with bandgap A » T, within the isotropic, 
parabolic model of excitations in its conduction and valence bands. 

Hint: In semiconductor physics, the names of conduction and valence bands are given to such 
two adjacent allowed energy bands 44 that at T = 0, all states of the valence band are fully occupied by 
electrons, while the conduction band is completely empty - see Fig. on the right. 
Within the model mentioned in the assignment (which gives a good approximation 
for semiconductors of the A 3 B 5 group, e.g., GaAs) the energy of an electron-like 
excitation in the conduction band follows the isotropic, parabolic law (3), but with 
the origin at the band edge sc, and a mass m e usually smaller than the free electron 
mass. Similarly, the parabolic dispersion law of a single "no-electron" excitation 
(called the hole) in the valence band is also offset to the edge of that band, sy t but 




44 A discussion of the band theory may be found, e.g., in QM Sec. 2.7 and 3.4. 
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corresponds to a negative effective mass (-m/,), usually with nih > m e (see Fig. on the right): 

\s c +p 2 l2m e , for s>s c , 
s = < with s c - s v = A . 

[£ r -p I2m h , fo?£<£ r , 

The excitations of both types follow the Fermi-Dirac statistics, and (within this simple model) do not 
interact directly. 



3.6 . Calculate the effective latent heat per particle, A = -(8Q/8No)n,v, of the Bose-Einstein 
condensate evaporation, as a function of temperature T. Here Q is the heat absorbed by the (condensate 
+ gas) system as a whole, while No is the number of particles in the condensate alone. 



3.7 . Prove that the specific heat C> of the Bose gas is a continuous function of temperature at the 
critical point T= T c . 

3.8 . In Chapter 1 , several thermodynamic equations involving entropy have been discussed, 
including the first of Eqs. (1.39): 

S = -{dG/dT) p . 

If we combine this expression with the fundamental relation (1.56), G = /uN, and it looks like that for the 
Bose-Einstein condensate, whose chemical potential /u vanishes at temperatures below the critical value 
T c , the entropy vanishes. On the other hand, Eq. (4.35) withX= F reads 

C v =T{dS/8T)y. 

If C v is known as a function of temperature, the last equation may be integrated to calculate S: 

Cy{T) 



S= f ^2<1T + const. 

J T 



T 

( / =const 



For the Bose-Einstein condensate, we have calculated the specific heat to be proportional to T , so that 
the integration gives nonvanishing entropy S <x T . Explain this paradox. 



3.9 . Calculate the chemical potential of an ideal 2D gas of spin-0 Bose particles as a function of 
its density n (per unit area), and find out whether such a gas can condense at low temperatures. 



3.10 . Use Eqs. (115) and (120) to calculate the third virial coefficient C(T) for the hardball 
model of particle interactions. 
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Chapter 4. Phase Transitions 

This chapter is a brief discussion of coexistence between different states ("phases") of collections of the 
same particles, and the laws of transitions between these phases. Due to the complexity of these 
phenomena, which involve interaction of the particles, quantitative results have been obtained only for a 
few very simple models, typically giving only a very approximate description of real systems. 

4.1. First-order phase transitions 

From everyday experience, say with ice, water, and water vapor, we know that one chemical 
substance (i.e. a set of many similar particles) may exist in several stable states - phases. A typical 
substance may have: 

(i) a dense solid phase, in which interatomic forces keep all atoms in fixed relative positions, 
with just small thermal fluctuations about them; 

(ii) a liquid phase, of comparable density, in which the relative distances between atoms or 
molecules are almost constant, but the particles are free to move about each other, and 

(iii) the gas phase, typically of a much lower density, in which the molecules are virtually free to 
move all around the containing volume. 1 

Experience also tells us that at certain conditions, two phases may be in thermal and chemical 
equilibrium - say, ice floating on water, with temperature at the freezing point. Actually, in Sec. 3.4 we 
already discussed a qualitative theory of one such equilibrium, the Bose-Einstein condensate 
coexistence with the uncondensed "vapor" of similar particles. However, this is a rather rare case when 
the phase coexistence is due to the quantum nature of particles (bosons) that may not interact directly. 
Much more frequently, the formation of different phases, and transitions between them, is an essentially 
classical effect due to particle interactions. 

Phase transitions are sometimes classified by their order. 2 I will start my discussion with the 
first-order phase transitions that feature non-vanishing latent heat A - the amount of heat that is 
necessary to give to one phase in order to turn it into another phase, even if temperature and pressure are 
kept constant. 3 Let us discuss the most simple and popular phenomenological model of the first-order 
phase transition, suggested in 1873 by J. van der Waals. 

In the last chapter, we have derived Eq. (3.99) for the classical gas of weakly interacting 
particles, which takes into account (albeit approximately) both interaction components necessary for a 



1 The plasma phase, in which atoms are partly or completely ionized, in frequently mentioned in the same breath 
as the three phase listed above, but one has to remember that in contrast to them, a typical electroneutral plasma 
consists of particles of two different sorts - ions and electrons. 

2 Such classification schemes, started by P. Ehrenfest, have been repeatedly modified, and only the "first-order 
phase transitions is still a generally accepted term. 

3 For example, for water the latent heat of vaporization at ambient pressure is as high as ~2.2x 10 6 J/kg, i.e. -0.4 
eV per molecule, making this liquid indispensable for many practical purposes - including fire fighting. (The 
latent heat of water's ice melting is an order of magnitude lower.) 
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realistic discussion of gas condensation - the long-range attraction of the particles and their short-range 
repulsion. Let us rewrite that result as follows: 



P + a- 



V 2 



NT 



r 



V 



1 + — 

v v 



(4.1) 



As we saw in Sec. 3.5, the physical meaning of constant b is the effective volume of space taken 
by a particle pair collision. Equation (1) is quantitatively valid only if the second term in the parentheses 
is small, Nb « V, i.e. if the total volume excluded from particles' free motion because of their collisions 
is much smaller than the whole volume V of the system. In order to describe the condensed phase (which 
I will call "liquid"), 4 we need to generalize this relation to the case Nb ~ V. Since the effective volume 
left for particles' motion is V- Nb, it is very natural to make the following replacement: V — > V - Nb, in 
the ideal gas' equation of state. If we still keep the term aN 2 /V 2 , which describes the long-range 
attraction of particles, we get the van der Waals equation 



P + a 



NT 
V 2 



NT 



V-Nb 



(4.2) 



Van der 

Waals 

equation 



The advantage of this simple model is that in the rare gas limit, Nb « V, it reduces back to Eq. (1). (To 
check this, it is sufficient to Taylor-expand the right-hand part of Eq. (2) in small parameter Nb/V « 1, 
and retain only two leading terms corresponding to two first virial coefficients.) Let us explore 
properties of this model. 

It is frequently convenient to discuss any equation of state in terms of its isotherms, i.e. P(V) 
curves plotted at constant T. As Eq. (2) shows, in the van der Waals model such a plot depends on 4 
parameters (a, b, N, and T.) However, for its analysis it is convenient to introduce dimensionless 
variables: pressure p = PIP C , volume v = V/V& and temperature t = TIT C , normalized to their so-called 
critical values, 



P. = 



1 a 

27 b 2 



V r = 3Nb, T = 



27 b 



In these notations, Eq. (2) acquires the following form, 



f 3^ 
P + - 



v 



8/ 



v ) 



(3v-l)' 



(4.3) 



(4.4) 



so that the normalized isotherms p(v) depend on only one parameter, the normalized temperature t - see 
Fig. 1. The most important property of these plots is that the isotherms have qualitatively different 
shapes in two temperature regions. 5 At t > 1, i.e. T > T c , pressure increases monotonically at gas 
compression (just like in an ideal gas, to which this system tends at T » T c ), i.e. (dP/dV)T < 0 at all 
points of the isotherm. However, below the critical temperature T c , all isotherms feature segments with 
{dPldV)r >0. It is easy to understand that, as least in a constant pressure experiment (see, for example, 



4 Due to the phenomenological character of the van der Waals model, one cannot say whether the condensed 
phase it predicts corresponds to a liquid or a solid. However, in most real substances at ambient conditions, gas 
coexists with liquid, hence the name I will use. 

5 The special choice of numerical coefficients in Eq. (3) is motivated by making the border between two regions 
to take place exactly at t = 1, i.e. at temperature T c , with the critical point coordinates equal to P c and V c . 
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Fig. 1.5), 6 these segments describe a mechanically unstable equilibrium. Indeed, if due to a random 
fluctuation, the volume deviated upward from the equilibrium value, the pressure would also increase, 
forcing the environment (say, the heavy piston in Fig. 1.5) to allow a further expansion of the system, 
leading to even higher pressure, etc. A similar deviation of volume downward would lead to a similar 
avalanche-like decrease of the volume. Such avalanche instability would develop further and further 
until the system has reached one of the stable branches with a negative slope (dP/dV)r- In the range 
where the single-phase equilibrium state is unstable, the system as a whole may be stable only if it 
consists of the two phases (one with a smaller, and another with a higher density n = N/V) that are 
described by the two stable branches. 



P 







1.2 




t = 


1.1 
1.0 






O.Q^^^^^ 
0.8 



Fig. 4.1. The van der Waals equation 
plotted on the [p, v] plane for several values 
of reduced temperature t = T IT C . Shading 
shows the single-phase instability range in 
which (dP/dV)r > 0. (The reader is invited 
to contemplate the physical sense and 
possibility of experimental observation of 
the negative values of pressure, predicted 
by the model.) 



v = y/v 



In order to understand the basic properties of this two-phase system, let us recall the general 
conditions of equilibrium of two thermodynamic systems, which have been discussed in Chapter 1 : 



7j = T 2 (thermal equilibrium), 
/u l = ju 2 ("chemical" equilibrium), 



(4.5) 
(4.6) 



Phase the latter condition meaning that the average energy of a single ("probe") particle in both systems is the 
Conditions same. To those, we should add the evident condition of mechanical equilibrium, 



P 1 = P 2 (mechanical equilibrium), 



(4.7) 



that immediately follows from the balance of normal forces exerted on an inter-phase boundary. 



If we discuss isotherms, Eq. (5) is fulfilled automatically, while Eq. (7) means that the effective 
isotherm P(V) describing a two-phase system should be a horizontal line (Fig. 2): 7 



6 Actually, this assumption is not crucial for our analysis of mechanical stability, because if a fluctuation takes 
place in a small part of the total volume V, its other parts play the role of pressure-fixing environment. 

7 Frequently, especially for water gas diluted in air (vapor), Po(T) is called the saturated vapor pressure, while the 
temperature at which Pq(T) equals to the ambient pressure, is called the dew point, and its frequently used for an 
implicit characterization of the concentration n = N/V of water vapor in air. 
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P = P 0 (T) 



(4.8) 



liquid and gas 
in equilibrium 



Po(X) 




stable gaseous 
phase 



Fig. 4.2. Phase equilibrium 
at T < T c (schematically). 



Along this line, internal properties of each phase do not change; only the particle distribution is: 
it evolves gradually from all particles being in the liquid phase at point 1 to all particles being in the gas 
phase at point 2. 8 In particular, according to Eq. (6), the chemical potentials ju of the phases should be 
equal at each point of the horizontal line (8). This fact enables us to find the line's position: it has to 
connect points 1 and 2 in which the chemical potentials of the phases are equal to each other. 



Let us recast this condition as 



z z, 

J J// = 0, i.e. \dG = Q. 



(4.9) 



where the integral may be taken along the single-phase isotherm. (For this mathematical calculation, the 
mechanical instability of some states on this curve is not important.) Along that curve, N = const and T 
= const, so that according to Eq. (1.53c), dG = -SdT + VdP +/udN, for a slow (reversible) change, dG = 
VdP. Hence Eq. (9) yields 



VdP = 0 . 



(4.10) 



From Fig. 2, it is easy to see that geometrically this equality means that the shaded areas Ad and A u 
should be equal, and hence Eq. (10) may be rewritten in the form of the so-called Maxwell's rule 



][P-P 0 (T)]dV = 0. 



(4.11) 



Maxwell's 
rule 



8 An important question is: why does the phase-equilibrium line P = Po(T) stretch all the way from point 1 to 
point 2 (Fig. 2)1 Indeed, branches 1 - 1 ' and 2-2 ' of the single-phase isotherm have negative derivative {dPldV) T and 
hence are mechanically stable to small perturbations. The answer is that these branches are actually metastable, 
i.e. have larger Gibbs energy per particle (i.e. //) than the counterpart phase and are hence unstable to larger 
perturbations - such as foreign microparticles (say, dust), confining wall protrusions, etc. In very controlled 
conditions, these single-phase "superheated" or "supercooled" states can survive virtually all the way to zero- 
derivative points 1 ' and 2', leading to sudden jumps of the system into the counterpart phase. (For fixed-pressure 
conditions, such jumps are shown by dashed lines in Fig. 2.) However, at more realistic conditions, perturbations 
result in the two-phase coexistence extending all the way between (or very close to) points 1 and 2. 
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Latent 
heat: 
definition 



This relation is more convenient for calculations than Eq. (10) if the equation of state may be 
explicitly solved for P - as it is the case for the van der Waals equation (2). Such calculation (left for 
reader's exercise) shows that for that model, the temperature dependence of the saturated gas pressure at 
low T is exponential, 



P 0 (r) = 27P c exp 



M, with U 0 =- = — T c , forT«T„ 
TV b 8 c 



(4.12) 



corresponding very well to the physical picture of the rate of particle activation from the potential well 
of depth U 0 . 9 

The signature parameter of the first-order phase transition, the latent heat of evaporation 



(4.13) 



may be found by a similar integration along the single-phase isotherm. Indeed, using Eq. (1.19), dQ 
TdS, we get 




A = $TdS =T(S 2 -S^. 



(4.14) 



Instead of calculating entropy from the equation of state (as was done for the ideal gas in Sec. 1.4), it is 
easier to express the right-hand side of Eq. (14) directly via that equation. For that, let us take the full 
derivative of Eq. (6) over temperature, considering each value of G = N/u as a function of P and T, and 
taking into account that according to Eq. (7), P\ = P 2 = Po(T): 



5G, 
dT 



+ 



Jp 



9G, 
BP 



Jt 



dP^ 
dT 



8G, 
dT 



+ 



Jp 



dG 2 
dP 



Jt 



dl\ 

dT 



(4.15) 



According to the first of Eqs. (1.39), the partial derivative (dG/dT) P is just minus entropy, while 
according to the second of those equations, (8G/dP) T is the volume. Thus Eq. (15) becomes 



dP n dP n 
■S,+V,—^ = -St+V, " 



dT 



dT 



(4.16) 



Solving this equation for (S2 - SV), and plugging the result into Eq. (14), we get the Clapeyron-Clausius 
formula 



Clapeyron- 
Clausius 
formula 



dT 



(4.17) 



For the van der Waals model, this formula may be readily used for the analytical calculation of A may in 
two limits: T « T c and (T c - T) « T c - the exercise left for the reader. In the latter limit, A <x (T c - 



1/2 

7) , naturally vanishing at the critical temperature. 

Finally, some important properties of the van der Waals' model may be revealed more easily by 
looking at the set of its isochores P = P(T) for V = const, rather than at the isotherms. Indeed, as Eq. (2) 
shows, all single-phase isochores are straight lines. However, if we interrupt these lines at the points 



9 It is fascinating how well is this Arrhenius exponent hidden in the polynomial van der Waals equation (2)! 
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when the single phase becomes metastable, and complement them with the (very nonlinear!) 
dependence Po(T), we get the pattern (called the phase diagram) shown schematically in Fig. 3a. 



(a) (b) 




Fig. 4.3. (a) Fhe van der Waals model's isochores, the saturated gas pressure diagram and 
the critical point, and (b) the phase diagram of a typical 3-phase system (schematically). 



At this plot, one more meaning of the critical point {P c , T c } becomes very clear. At fixed 
pressure P < P c , the liquid and gaseous phases are clearly separated by the transition line Po(T), so if we 
achieve the transition just by changing temperature, and hence volume (shown with the red line in Fig. 
3), we will pass through the phase coexistence stage. However, if we perform the same final transition 
by changing both the pressure and temperature, going around above the critical point (the blue line in 
Fig. 3), no definite point of transition may be observed: the substance stays in a single phase, and it is a 
subjective judgment of the observer in which region that phase should be called the liquid, and which 
region the gas. For water, the critical point corresponds to 647 K (374°C) and P c ~ 22.1 MPa (i.e. -200 
bars), so that a lecture demonstration of its critical behavior would require substantial safety 
precautions. This is why such demonstrations are typically carried out with other fluids such as the 
diethyl ether, 10 with much lower T c (194 °C) and P c (3.6 MPa). Though the ether is colorless and clear in 
both gas and liquid phases, their separation (due to gravity) is visible (due to a difference in an optical 
refraction coefficient) at P < P c , but not above P c . 1 1 

Thus, in the van der Waals model, two phases may coexist, though only at certain conditions (P 
< P c ). Now the natural question is whether the coexistence of more than two phases of the same 
substance is possible. For example, can the water ice, liquid water, and water vapor (steam) be in 
thermodynamic equilibrium? The answer is essentially given by Eq. (6). From thermodynamics, we 
know that for a uniform system (with G = juN), pressure and temperature completely define the chemical 
potential. Hence, dealing with two phases, we have to satisfy just one chemical equilibrium condition 
(6) for two common parameters P and T. Evidently, this leaves us with one extra degree of freedom, so 



10 (CH3-CH 2 )-0-(CH2-CH 3 ) , historically the first popular general anesthetic. 

11 It is interesting that very close to the critical point the substance suddenly becomes opaque - in the case of 
ether, whitish. The qualitative explanation of this effect, called the critical opalescence, is simple: at this point the 
difference of Gibbs energies per particle (i.e. chemical potentials) of the two phases becomes so small that the 
unavoidable thermal fluctuations lead to spontaneous appearance and disappearance of relatively large (a-few- 
um-scale) single -phase regions in all the volume. Since the optical refraction coefficients of the phases are 
slightly different, large concentration of the region boundaries leads to strong light scattering. 
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Gibbs 
phase 
rule 



that the two-phase equilibrium is possible within a certain range of P at fixed T (or vice versa) - see Fig. 
3a. Now, if we want three phases to be in equilibrium, we need to satisfy two equations for these 
variables: 

MPJ) = MP,T) = MP,T)- (4-18) 

Typically, functions ju(P, T) are monotonic, so that Eqs. (18) have just one solution, the so-called triple 
point {P t , T,} . Of course, the triple point {P t , T t ) of equilibrium between three phases should not to be 
confused with the critical points {P c , T c ) of transitions between two phase pairs. Fig. 3b shows, very 
schematically, their relation for a typical three-phase system solid-liquid-gas. For example, water, ice, 
and water vapor are at equilibrium at a triple point corresponding to 0.612 KPa and (by definition, 
exactly) 273.16 K. 12 The particular importance of this particular temperature point is that by an 
international agreement it has been accepted for the definition of 0°C. 13 More generally, triple points of 
pure substances (such as H 2 , N2, O2, Ar, Hg, and H2O) are broadly used for thermometer calibration, 
defining the so-called international temperature scales including the currently accepted scale ITS-90. 

This result may be readily generalized to multi-component systems consisting of particles of 
several (say, L) sorts. 14 If such a system is in a single phase, i.e. macroscopically uniform, its chemical 
potential may be defined by the natural generalization of Eq. (1.53c): 

dG = -SdT + VdP + Y j ^ ) dN {l) . (4.19) 

Typically, a single phase is not a pure substance, but has certain concentrations of other components, so 
that f} l) may depend not only on P and T, but also on concentrations c (/) = A^'ViV of particles of each sort. 
If the total number N of particles is fixed, the number of independent concentrations is (L - 1). For the 
chemical equilibrium of R phases, all R values of ju} l) (r = 1, 2, R) have to be equal for particles of 
each sort: jui® = ju^ = ... = jur 1 \ with each fijr depending on (L - 1) concentrations c r ®, and also on P 
and T. This requirement gives L(R - 1) equations for (L -l)R concentrations c r , plus two common 
arguments P and 77, i.e. for [(L -l)R + 2] independent variables. This means that the number of phases 
has to satisfy the limitation 

L(R-\)<(L-\)R + 2, i.e. R<L + 2, (4.20) 



where the equality sign may be reached in just one point in the whole parameter space. This is the Gibbs 
phase rule. As a sanity check, for a single-component system, L = 1, the rule yields R < 3 - exactly the 
result we have already discussed. 



4.2. Continuous phase transitions 

As Fig. 2 shows, if we fix pressure P in a system with a first-order phase transition, and start 
changing its temperature, crossing the transition point, defined by equation Po(T) = P, requires the 



12 Please note that P, for water is several orders of magnitude lower than P c of the water-vapor transition, so that 
Fig. 3b is indeed very much not to scale! 

13 Colloquially, this means that the absolute zero of temperature corresponds to exactly -273.16°C. 

14 Perhaps the most practically important example is the air/water system. For its detailed discussion, based on Eq. 
(19), the reader may be referred, e.g., to Sec. 3.9 in F. Schwabl, Statistical Mechanics, Springer (2000). Other 
important applications include metallic alloys - solid solutions of metal elements. 
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insertion (or extraction) the finite latent heat A. Relations (14) and (17) show that the latent heat is 
directly related to the finite difference between entropies and volumes of the two phases (at the same 
pressure). As we know from Chapter 1, both S and V may be presented as first derivatives of appropriate 
thermodynamic potentials. This is why such transitions, involving a jump of potentials' first derivatives, 
are called first-order phase transitions. 

On the other hand, there are phase transitions that have zero latent heat (A = 0) and no first 
derivative jumps at the transition temperature T c , so that the temperature point is clearly marked, for 
example, by a jump of a second derivative of a thermodynamic potential - for example, the derivative 
dC/dT which, according to Eq. (1.24), equals to d 2 E/dT 2 . In the initial classification by P. Ehrenfest, this 
was an example of a second-order phase transition. However, most features of such phase transitions are 
also pertinent to some systems in which the second derivatives of potentials are continuous as well. Due 
to this reason, I will use a more recent terminology (suggested by M. Fisher), in which all phase 
transitions with A = 0 are called continuous. 

Most continuous phase transitions result from particle interactions. Here are some examples: 

(i) At temperatures above ~ 120°C, the crystal lattice of barium titanate (BaTiC^) is cubic, with a 
Ba ion in the center of each Ti-cornered cube (or vice versa) - see Fig. 4a. However, as temperature is 
being lowered below that critical value, the sublattice of Ba ions starts moving along one of 6 sides of 
the TiCh sublattice, leading to a small deformation of both lattices - which become tetragonal. This is a 
typical example of a structural transition, in this particular case combined with a ferroelectric 
transition, because (due to the positive electric charge of Ba ions) below the critical temperature the 
BaTiCh crystal acquires a spontaneous electric polarization. 

(b) 



Fig. 4.4. Cubic lattices of 
(a) BaTi0 3 and (b) CuZn. 



(ii) A different kind of phase transition happens, for example, in Cu^Zni.^ alloys (brasses). Their 
crystal lattice is always cubic, but above certain critical temperature T c (which depends on x) any of its 
nodes is occupied by either a copper or a zink atom, at random. At T < T c , a trend towards atom 
alternation arises, and at low temperatures, the atoms are fully ordered, as shown in Fig. 4b for the 
stoichiometric case x = 0.5. This is a good example of an order-disorder transition. 

(iii) At ferromagnetic transitions (happening, e.g., in Fe at 1,388 K) and antiferromagnetic 
transitions (e.g., in MnO at 116 K), lowering of temperature below the critical value 15 does not change 
atom positions substantially, but results in a partial ordering of atomic spins, eventually leading to their 
full ordering (Fig. 5). 



15 For ferromagnets, this point is usually referred to at the Curie temperature, and for antiferromagnets, as the 
Neel temperature. 



(a) 



~Ba? Ti iO 
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(iv) Finally, the Bose-Einstein condensation of atoms in liquid helium and electrons in 
superconducting metals and metal oxides may be also considered as continuous phase transitions. At the 
first glance, this contradicts to the nonvanishing latent heat given by the BEC theory outlined in Sec. 
3.4. However, that theory shows that A — » 0 at T — > 0 and hence P(T) — > 0 - see Eq. (3.79). Hence, at 
zero pressure the Bose Einstein condensation of an ideal gas could may be considered a continuous 
phase transition. For a gas, this is just not a very interesting limit, because of the vanishing gas density. 
On the contrary, the Bose-Einstein condensation of strongly interacting particles in liquids or solids is 
not affected by pressure - at least on the ambient pressure scale, and taking P = 0 is quite a legitimate 
assumption. 16 




Fig. 4.5. Classical images of 
completely ordered phases: 
(a) a ferromagnet, and (b) 
an antiferromagnet. 



Besides these standard examples, some other threshold phenomena, such as formation of a 
coherent optical field in a laser, and even the self-excitation of oscillators with negative damping (see, 
e.g., CM Sec. 4.4), may be treated, at certain conditions, as continuous phase transitions. 17 

The general feature of all these transitions is the gradual formation, at T < T c , of certain ordering, 
which may be characterized by some order parameter rj ^ 0. The simplest example of such order 
parameter is the magnetization at the ferromagnetic transitions, and this is why the continuous phase 
transitions are usually discussed on certain models of ferromagnetism. (I will follow this tradition, while 
mentioning in passing other important cases that require a substantial modification of theory.) Most of 
such models are defined on an infinite 3D cubic lattice (see, e.g., Fig. 5), with evident generalizations to 
lower dimensions. For example, the Heisenberg model of a ferromagnet is defined by the following 
Hamiltonian: 



Heisenberg 
model 



H 



-J J^G; a r -hJ^Gj n 3 , with h = ju B 3, 



(4.21) 



16 As follows from the discussion of Eqs. (1.1 )-( 1.3), for ferroelectric transitions between phases with different 
electric polarization, the role of pressure is played by the external electric field £, while for the ferromagnetic 
transitions between phases with different magnetization, by the external magnetic field ft. As we will see very 
soon, such fields give such a phase transition a nonvanishing latent heat, making it the first order transition. 

17 Unfortunately, I will have no time for these interesting (and practically important) generalizations, and have to 
refer the interested reader to the famous monograph by R. Stratonovich, Topics in the Theory of Random Noise, in 
2 vols., Gordon and Breach, 1963 and 1967, and/or the influential review by H. Haken, Ferstkorperprobleme 10, 
351 (1970). 
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where a . is the Pauli matrix operator 18 acting on j-th spin, n^is the direction of magnetic field 3, and 
constant /jb is the Bohr magneton 

ju B = — —— ~ 0.927 x 10 23 J/T , (4.22) 

2m e 

with (-e) and m e being electron's charge and mass. The figure brackets {j, j'} in Eq. (21) denote the 
summation over the pairs of adjacent sites, so that the magnitude of constant J may be interpreted as the 
maximum coupling energy per "bond" between two adjacent particles. At J > 0, the coupling tries to 
keep spins aligned (thus minimizing the coupling energy), i.e. to implement the ferromagnetic 
ordering. 19 The second term in Eq. (21) describes the effect of external magnetic field 3, which tries to 
turn all spins, with their magnetic moments, along its direction. 

However, even the Heisenberg model, while being approximate, is still too complex for analysis. 
This is why most theoretical results have been obtained for its classical twin, the Ising model: 20 

E m =-JT, s j s r- h H s j ■ 

Uj'} j 

Here E m are eigenvalues of energy in the magnetic field, constant h mimics an external magnetic 
field, and Sj are classical scalar variables that may take only two values, Sj = ±1. (Despite its classical 
character, variable Sj modeling the real spin of an electron, is usually called "spin" for brevity, and I will 
follow this tradition.) Index m numbers all possible combinations of variables sj - there are 2 N of them in 
a system of N Ising "spins". Somewhat shockingly, even for this toy model, no analytical 3D solutions 
have been found, and the solution of its 2D version by L. Onsager in 1944 (see Sec. 5 below) is still 
considered one of the top intellectual achievements of the statistical physics. Still, Eq. (23) is very useful 
for the introduction of basic notions of continuous phase transitions, and methods of their analysis, and I 
will focus my brief discussion on this model. 21 

Evidently, if T = 0 and h = 0, the lowest value of internal energy, 

£ mm = -JNd , (4.24) 

where d is the lattice dimensionality, is achieved in the "ferromagnetic" phase in which all spins Sj are 
equal to either + 1 or -1 simultaneously, so that the lattice average (sj) = ±1. On the other hand, at J = 0 
and h = 0, the spins are independent, and in the absence of external field their signs are completely 
random, with the 50% probability to have either of values ±1, so that (sj) = 0. Hence in the case of 
arbitrary parameters we may use the average 



Ising 

(4.23) model 



18 See, e.g., QM Sec. 4.4. 

19 At J < 0, the first term of Eq. (21) gives a reasonable model of an antiferromagnet, but in this case the external 
magnetic field effects are more subtle, so I will not have time to discuss it. 

20 Named after E. Ising who explored the ID version of the model in detail in 1925, though a similar model was 
discussed earlier (in 1920) by W. Lenz. 

21 For a more detailed discussion of phase transition theory (including other popular models of the ferromagnetic 
phase transition, e.g., the Potts model), see, e.g., either H. Stanley, Introduction to Phase Transitions and Critical 
Phenomena, Oxford U. Press, 1971; or A. Patashinskii and V. Pokrovskii, Fluctuation Theory of Phase 
Transitions, Pergamon, 1979; or B. McCoy, Advanced Statistical Mechanics, Oxford U. Press, 2010. For a much 
more concise text, I can recommend J. Yeomans, Statistical Mechanics of Phase Transitions, Clarendon, 1992. 



Chapter 4 



Page 10 of 34 



Essential Graduate Physics 



SM: Statistical Mechanics 



Ising 
model's 
order 
parameter 



(4.25) 



as a good measure of spin ordering, i.e. as the order parameter. Since in a real ferromagnet, each spin 
carries a magnetic moment, the order parameter 77 corresponds to the substance magnetization, at rjh > 
0, directed along the applied magnetic field. 22 

Due to the difficulty of calculating the order parameter for arbitrary temperatures, most 
theoretical discussions of continuous phased transitions are focused on its temperature dependence just 
below T c . Both experiment and theory show that (in the absence of external field) this dependence is 
close to a certain power, 

forr>0, (4.26) 
of the small deviation from the critical temperature, which is conveniently normalized as 

T -T 

r = . (4.27) 

c 

Remarkably, most other key variables follow a similar temperature behavior, with the same critical 
exponent for both signs of r. In particular, the heat capacity at fixed magnetic field behaves as 23 

C h K ^- . (4.28) 



Similarly, the (normalized) low-field susceptibility 24 

drj 
~8h 



X = ^r\ h =oK — . (4.29) 



Two more important critical exponents, £ and v, describe temperature behavior of the 
correlation function (sjSy) whose dependence on distance rjj> between two spins may be well fitted by the 
following law, 

s,s r ) cc— i--expi-^l (4.30) 



'rr 1 - d-2+c 

r ir L '«J 

with the correlation radius 

r c*-^r. (4.31) 

T 

Finally, three more critical exponents, usually denoted s, 8, and ju, describe the external field 
dependences of, respectively, c, rj and r c at r = 0. For example, 8 is defined as 



22 See, e.g., EM Sees. 5.4-5.5. 

23 The form of all temperature functions is selected so that all critical exponents are non-negative. 

24 This variable models the real physical magnetic susceptibility % m of magnetic materials - see, e.g., EM Eq. 
(5.111). 
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77 oc h/ s . (4.32) 

(Other field exponents are used less frequently, and for their discussion I have to refer the interested 
reader to the special literature listed above.) 

The second column of Table 1 shows experimental values of the critical exponents for various 
3D physical systems featuring continuous phase transitions. One can see that their values vary from 
system to system, leaving no hope for a universal theory that would describe them all. However, certain 
combination of the exponents are much more reproducible - see the bottom lines of the table. 



Table 4.1. Major critical exponents of continuous phase transitions 



Exponents and 
combinations 


Experimental 
range (3D) (a) 


Mean-field 
theory 


2D Ising 
model 


3D Ising 
model 


3D Heisenberg 
Model (d) 


a 


0-0.14 


0 <b) 


(c) 


0.12 


-0.14 


P 


0.32-0.39 


1/2 


1/8 


0.31 


0.3 


r 


1.3-1.4 


1 


7/4 


1.25 


1.4 


5 


4-5 


3 


15 


5 




V 


0.6-0.7 


1/2 


1 


0.64 


0.7 




0.05 


0 


1/4 


0.05 


0.04 


(« + 2/?+ y)/2 


1.00 ±0.005 


1 


1 


1 


1 


8 -yip 


0.93 ±0.08 


1 


1 


1 


? 


(2-Qv/y 


1.02 ±0.05 


1 


1 


1 


1 


(2 - a)l vd 


(e) 


Aid 


1 


1 


1 



Experimental data are from the monograph by A. Patashinskii and V. Pokrovskii, cited above. 
Discontinuity at r= 0 - see below. 

Instead of following Eq. (28), in this case C h diverges as ln| r|. 
With the order parameter rj defined as (q,-n»). 
I could not find any data on this. 



Historically the first (and perhaps the most fundamental) of these universal relations was derived 
in 1963 by J. Essam and M. Fisher: 

a + 2j3 + y = 2. (4.33) 

It may be proved, for example, by finding the temperature dependence such magnetic field value, h T , 
which changes the order parameter by the amount similar to that already existing at h = 0, due to a finite 
temperature deviation t > 0. First, we may compare Eqs. (26) and (29), to get 

h T oc r P+y . (4.34) 
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By the physical sense of /i r we may expect that such field has to affect system's free energy 25 F(T, h) by 
the amount comparable to the effect of a bare temperature change r. Ensemble-averaging the last term of 
Eq. (23) and using the definition (25) of the order parameter rj, we see that the change of F (per particle) 
due to the field equals -h T rj and, according to Eq. (26), scales as h T r 13 <x l 2/3+ r \ 

In order to estimate the thermal effect on F, let us first derive one more useful general 
thermodynamic formula. 26 Dividing Eq. (1.19) by dT, we may present heat capacity of a system as 



C X =T 



as 

dT 



(4.35) 



Jx 



where X is the variable maintained constant at the temperature variation. For example, in the standard 
"P-V thermodynamics, we may use the first of Eqs. (1 .35) to recast Eq. (35) for X = V as 



Cy =T 



(8S^ 




r d 2 F^ 


= -T 


[dT, 


V 


dT 2 



(4.36) 



while for X = P it may be combined with Eq. (1 .39) to get 



C P =T 



fdS^ 




r d 2 G^\ 




= -T 




{dT; 


p 


K dT 2 ) 



(4.37) 



As was just discussed, in the Ising model the role of pressure P is played by the external 
magnetic field h, and of G by F, so that the last form of Eq. (37) means that the thermal part of F may be 
found by double integration of (-Ch/T) over temperature. In the context of our current discussion, this 
means that near T c , the free energy scales as the double integral of Ch °c r' a over r. In the limit r« 1, 
factor T may be treated as a constant; as a result, the change of F due to r > 0 alone scales as z^ 2 " a \ 
Requiring this change to be proportional to the same power of r as the field-induced part of energy, we 
get the Essam-Fisher relation (33). 

Using similar reasoning, it is straightforward to derive a few other universal relations of critical 
exponents, including the Widom relation, 



25 There is some duality of terminology (and notation) in literature on this topic. Indeed, in the Ising model (as in 
the Heisenberg model), the magnetic field effects are usually accounted at the microscopic level, by the inclusion 
of the corresponding term into each particular value of energy E m . Then, as was discussed in Sec. 1.4, system's 
equilibrium (at fixed external field h, and also T and N) corresponds to the minimum of the Helmholtz free energy 
F. From this point of view, these problems do not feature either pressure or volume, hnce we may take PV = 
const, so that both thermodynamic potentials effectively coincide: G = F + PV = F + const. On the other hand, it 
is fair to say that the role of the magnetic field in these problems is very similar to that of pressure (or rather of - 
P) in the "usual" thermodynamics. Due to this analogy, and taking into account that the equilibrium of a system at 
fixed P corresponds to the minimum of the Gibbs free energy G, in some publications this name is used for the 
minimized potential. Still, on the microscopic level, there is a difference in the descriptions of field and pressure - 
see the footnote in the end of Sec. 2.4. Due to this reason, I will follow the traditional, first point of view in most 
of my narrative, but will use the replacements F — > G and h — > -P to use thermodynamic formulas (1.39) and (37) 
when convenient. 

26 Admittedly, it belongs to Chapter 1 , but I was reluctant to derive it there to avoid a narrative interruption. 
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S-^ = \, (4.38) 

very similar relations for other high-field exponents s and /u (which I do not have time to discuss), and 
the Fisher relation 

v(2-C)=r- (4-39) 

A slightly more complex reasoning, involving the so-called scaling hypothesis, yields the 
dimensionality-dependent Josephson relation 

vd = 2-a. (4.40) 

Table 1 shows that at least three of these relations are in a very reasonable agreement with 
experiment, so that we will use them as a testbed for various theoretical approaches to continuous phase 
transitions. 



4.3. Landau's mean- field theory 

The most general approach to analysis of the continuous phase transitions, formally not based on 
any particular model (though in fact implying the Ising model (23) or one of it siblings), is the mean- 
field theory developed in 1937 by L. Landau, on the basis of prior ideas by P.-E. Weiss - to be discussed 
in the next section. The main approximation of this phenomenological approach is to present the free 
energy change AF at the phase transition as an explicit function of the order parameter rj (25). Generally 
this function may be complicated and model-specific, but near T c , rj has to tend to zero, so that the 
change of the relevant thermodynamic potential, the free energy, 

AF = F(T)-F(T c ), (4.41) 

may be expanded into the Taylor series in 77, and only a few, most important first terms of that 
expansion retained. In order to keep the symmetry between two possible signs of the order parameter in 
the absence of external field, at h = 0 this expansion should not include odd powers of rj: 

AF 1 

— \ h _ o =A(T)r J 2 +-B(T)r J 4 +.... (4.42) 

As we will see imminently, these two terms are sufficient to describe finite (non-vanishing but limited) 
stationary values of the order parameter; this is why Landau's theory ignores the higher terms of the 
Taylor expansion - which are much smaller at 77 — > 0. 

Now let us discuss temperature dependences of coefficients A and B. The equilibrium of the 
system should correspond to minimum of F. Equation (42) shows that, first of all, coefficient 5(7) has to 
be positive for any sign of r , to ensure the equilibrium at a finite value of if. Thus, it is reasonable to 
ignore the temperature dependence of B near the critical temperature altogether and use approximation 

B(T) = b>0. (4.43) 

On the other hand, as Fig. 6 shows, coefficient A(T) has to change sign at T = T c , being positive at T > 
T c and negative at T < T c , to ensure the transition from 77 = 0 at T > T c to a certain non-vanishing value at 
T < T c . Since A should be a smooth function of temperature, we may approximate it by the leading term 
in its Taylor expansion in r : 
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A(T) = -a r, with a > 0 , 



(4.44) 



so that Eq. (42) becomes 



AF 



2 1 

= -azr} + — 
2 



(4.45) 



V 



h=0 




Fig. 4.6. Free energy (42) as a 
function of (a) rj and (b) z/ 2 in 
Landau's mean-field theory, 
for two different signs of 
coefficient A (r). 



IB 



The main strength of Landau's theory is the possibility of its straightforward extension to the 
effects of the external field and of spatial variations of the order parameter. First, averaging of the field 
term of Eq. (23) over all sites of the system, with the account of Eq. (25), gives an energy addition of - 
hrj per particle, i.e. - nhrj per unit volume, where n is the particle density. Second, since (according to 
Eq. (23) with v> 0, see Table 1) the correlation radius diverges at z — > 0, spatial variations of the order 
parameter should be slow, I V/7 1 — > 0. Hence, the effects of the gradient on AF may be approximated by 
the first nonvanishing term of its expansion into the Taylor series in (Vrf) . As a result, Eq. (45) may be 
generalized as 



where c is a factor independent of 77. In order to avoid the unphysical effect of spontaneous formation of 
spatial variations of the order parameter, that factor has to be positive at all temperatures, and hence may 
be taken for constant in a small vicinity of T c - the only region where Eq. (46) may be expected to 
provide quantitatively correct results. 

Relation (46) is the full version of the free energy in Landau's theory. 27 Now let us find out what 
critical exponents are predicted by this phenomenological approach. First of all, we may find 
equilibrium values of the order parameter from the condition of F having a minimum, dF/drj = 0. At h = 
0, it is easier to use the equivalent equation dFld(rf) = 0, where F is given by Eq. (45) - see Fig. 6b. 
This immediately yields 



Historically, the last term belongs to the later (1950) extension of the theory by V. Ginzburg and L. Landau - 



Free 
energy in 
Landau 
theory 




(4.46) 




for r > 0, 
for r < 0. 



(4.47) 



see below. 
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Comparing this result with Eq. (26), we see that in the Landau theory, /?= Vi. Next, plugging result (47) 
back into Eq. (45), for the equilibrium (minimal) value of the free energy, we get 

VH„-- — f" >0 - (4.48) 
for r < 0. 




From here and Eq. (36), the specific heat, 



C h \a 2 /bT,, forr>0, 



V 0, for r < 0, 



(4.49) 



has, at the critical point, a discontinuity rather than a singularity, i.e. the critical exponent a = 0. 

In the presence of a uniform field, the equilibrium order parameter should be found from the 
condition df/drj= 0 applied to Eq. (46) with Vrj = 0, giving 

^ = -2arJ] + 2b?] 3 -nh = 0. (4.50) 
dr] 

In the limit of small order parameter, ?7 — > 0, term with rf is negligible, and Eq. (50) gives 

r] = --—, (4.51) 
2a r 

so that according to Eq. (29), y= 1 . On the other hand, at r = 0 (or at relatively high fields at other 
temperatures), the cubic term in Eq. (50) is much larger than the linear one, and this equation yields 



nh 
2b 



(4.52) 



so that comparison with Eq. (32) yields 8=3. 

2 2 

Finally, according to Eq. (30), the last term in Eq. (46) scales as cr/ lr c . (If r c ^ oo, the effects of 
the pre-exponential factor in that equation are negligible.) As a result, the gradient term contribution is 
comparable 28 with the two leading terms in Af (which, according to Eq. (47), are of the same order), if 

,1/2 

(4.53) 



f \ 
c 



\ a v\ 



so that according to definition (31) of the critical exponent v, it is equal to Vi. 

The third column in Table 1 summarizes the critical exponents and their combinations in 
Landau's theory. It shows that these values are somewhat out of the experimental ranges, and while 
some of their universal relations are correct, some are not; for example, the Josephson relation would be 
only correct at d = 4 (not the most realistic spatial dimensionality :-) The main reason for this 



28 According to Eq. (30), the correlation radius may be interpreted as the length distance at which the order 
parameter n relaxes to its equilibrium value, if it is deflected from it at some point. Since the law of such spatial 
change may be obtained by a variational differentiation of F, for the actual relaxation law, all major terms of (46) 
have to be comparable. 
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disappointing result is that describing the spin interaction with the field, the Landau mean-field theory 
neglects spin randomness, i.e. fluctuations. Though a quantitative theory of thermodynamic fluctuations 
will not be discussed until the next chapter, we can readily perform their crude estimate. Looking at Eq. 
(46), we see that its first term is a quadratic function of the effective "half-degree of freedom", rj. Hence 
in accordance with the equipartition theorem (2.28) we may expect that the average square of its thermal 
fluctuations, within a J-dimensional volume with linear size ~r c , should be of the order of 772 (close to 
the critical temperature, TJ2 is a good approximation): 

«H(*7 2 k d ~Y- ( 4 - 54 ) 

In order to be negligible, the variance has to be negligible in comparison with the average r/ ~ ar/b. 
Plugging in the z - dependences of the operands of this relation, and values of the critical exponents in 
the Landau theory, for r> 0 we get the so-called Levanyuk-Ginzburg criterion of its validity: 

d 

f ar^\2 a .. ... 

« -r . (4.55) 
b 



2a t 

We see that for any realistic dimensionality, d < 4, at t — » 0 the order parameter fluctuations grow faster 
than the its average value, and hence the theory becomes invalid. 

Thus the Landau mean-field theory is not a perfect approach to finding critical indices at 
continuous phase transitions in Ising-type systems with their next-neighbor interactions between the 
particles. Despite of that fact, this theory is very much valued because of the following reason. Any 
long-range interactions between particles increase the correlation radius r c , and hence suppress the order 
parameter fluctuations. For an example, at laser self-excitation, the emerging coherent optical field 
couples all photon-emitting particles in the electromagnetic "cavity" (resonator). As another example, in 
superconductors the role of the correlation radius is played by the Cooper-pair size £o, which is typically 

6 8 

of the order of 10" m, i.e. much larger than the average distance between the pairs (-10" m). As a 
result, the mean-field theory remains valid at all temperatures besides an extremely small temperature 
interval near T c - for bulk superconductors, of the order of 10" 6 K. 

Another strength of Landau's classical mean-field theory is that it may be readily generalized for 
description of Bose-Einstein condensates, i.e. quantum fluids. Of those generalizations, the most famous 
is the Ginzburg-Landau theory of superconductivity developed in 1950, i.e. even before the 
"microscopic" explanation of this phenomenon by Bardeen, Cooper and Schrieffer in the 1956-57. In 
the Ginzburg-Landau theory, the real order parameter rj is replaced with the modulus of a complex 
function y/, physically the wavefunction of the coherent Bose-Einstein condensate of Cooper pairs. 
Since each pair carry electric charge q = -2e, 29 and has zero spin, it interacts with magnetic field in a 
way different from that described by the Heisenberg or Ising models. Namely, as was already discussed 
in Sec. 3.4, the del operator V in Eq. (46) has to be complemented by term -i(q/h)A, where A is the 
vector-potential of the total magnetic field 3 = VxA, including not only the external magnetic field W, 



29 In the phenomenological Ginzburg-Landau theory, charge q remains unspecified, though the wording in their 
original paper clearly shows that the authors correctly anticipated that this charge might turn out to be different 
from the single electron charge. 
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but also the field induced by the supercurrent itself. With the account for the well-known formula for the 
magnetic field energy in the external field, 30 Eq. (46) is now replaced with 



(4.56) 



where m is a phenomenological coefficient rather than the actual particle mass. The variational 
minimization of the resulting AF over variables y/ and B (which is suggested for reader's exercise 31 ) 
yields two differential equations: 



(4.57) 



(4.58) 



The first of these Ginzburg-Landau equations should be no big surprise for the reader, because 
according to the Maxwell equations, in magnetostatics the left-hand part of Eq. (57) has to be equal to 
the electric current density, while the right-hand part is the usual quantum-mechanical probability 
current density multiplied by q, i.e. the electric current (or rather supercurrent) density j s of the Cooper 

1/2 

pair condensate. (Indeed, after plugging yr = n exp{/^} into that expression, we come back to Eq. 
(3.84) which, as we already know, explains such macroscopic quantum phenomena as magnetic flux 
quantization and Meissner-Ochsenfeld effect.) 

However, Eq. (58) is new - for this course. Since last term in its right-hand part is the standard 
wave-mechanics expression for the kinetic energy of a particle in the presence of magnetic field, 32 if this 
term dominates that part of the equation, Eq. (58) is reduced to the stationary Schrodinger equation, 

Ey/ = Hy/ , for the ground state of confinement-free Cooper pairs, with energy E = ar. However, in 
contrast to the usual (single-particle) Schrodinger equation, in which \yj\ is determined by the 
normalization condition, the Cooper pair condensate density n = | y/\ is determined by the 
thermodynamic balance of the condensate with the ensemble of "normal" (unpaired) electrons that play 
the role of the uncondensed part of Bose gas, discussed in Sec. 3.4. In Eq. (58), such balance is enforced 
by the first term b\ yy\ 2 yr of the right-hand part. 33 As we have already seen, in the absence of magnetic 
field and spatial gradients, such term yields | yJ\ <x (T c - T) - see Eq. (47). 



Free 

energy in 
Ginzburg- 
Landau 
theory 



Ginzburg- 
Landau 
equations 



30 See, e.g., EM Eq. (5.129). 

31 As a useful elementary sanity check, the minimization of Af in the absence of a superconductor, i.e. without the 
first 3 terms in the right-hand part of Eq. (56), immediately gives the correct result B = /UqH. 

32 See, e.g.,QM Sec. 3.1. 

33 From the mathematics standpoint, such term, nonlinear in \y/\, makes Eq. (58) a member of the family of 
"nonlinear Schrodinger equations". Another important member of this family is the Gross-Pitaevskii equation, 

I 2 ft ^ 

aTy/ = b\y/\ y/ V 2 y/ + U(r)y/ , 

2m 

which gives a very reasonable (albeit phenomenological and hence approximate) description of Bose-Einstein 
condensates of neutral atoms at T » T c . The differences between the Ginzburg-Landau and Gross-Pitaevskii 
equations reflect, first, the zero charge q of the neutral atoms and, second, the fact that the atoms forming the 
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It is easy to see that as either the external magnetic field or the current density in a 
superconductor are increased, so is the last term in Eq. (58). This increase has to be matched by a 
corresponding decrease of | yrf, i.e. of the condensate density n, until it is completely suppressed. This 
explains the well documented effect of superconductivity suppression by magnetic field and 
supercurrent. Moreover, together with the flux quantization discussed in Sec. 3.4, it explains the 
existence of the so-called Abrikosov vortices - thin tubes of magnetic field, each carrying one quantum 
O 0 of magnetic flux - see Eq. (3.86). At the core part of the vortex, | \jJ\ is suppressed (down to zero at 
its central line) by the persistent supercurrent, which circulates around the core and screens the rest of 
superconductor from the magnetic field carried by the vortex. The penetration of such vortices into the 
so-called type-II superconductors 34 enables them to sustain vanishing electric resistance up to very high 
magnetic fields of the order of 20 T, and to be used in very compact magnets - including those used for 
beam bending in particle accelerators. 

Moreover, generalizing Eq. (58) to the time-dependent case, just as it is done with the usual 
Schrodinger equation (E — > ihdldi), one can describe other fascinating quantum macroscopic phenomena 
such as the Josephson effects, including the generation of oscillations with frequency <x>j = (q/ti) V by 
tunnel junctions between two superconductors, biased by dc voltage ^ Unfortunately, time/space 
restrictions do not allow me to discuss these effects in any detail here, and I have to refer the reader to 
special literature. 35 Let me only note that at T « T c , and not extremely pure superconductors (in which 
the so-called non-local transport phenomena may be important), the Ginzburg-Landau equations are 
exact, and may be derived (and their parameters T c , a, b, q, and m determined) from the "microscopic" 
theory of superconductivity based on the initial work by Bardeen, Cooper and Schrieffer. 36 Most 
importantly, such derivation proves that q = -2e - the electric charge of a singe Cooper pair. 

4.4. Ising model: The Weiss molecular-field theory 

The Landau mean-field theory is phenomenological in the sense that even within the range of its 
validity, it tells us nothing about the value of the critical temperature T c and other parameters (in Eq. 
(46), a, b, and c), so that they have to be found from a particular "microscopic" model of the system 
under analysis. In this course, we would have time to discuss only the Ising model (23) for various 
dimensionalities d. 

The most simplistic way to map the model on a mean-field theory is to assume that all spins are 
exactly equal, Sj = r/, with an additional condition rf < 1, forgetting for a minute that in the genuine 
Ising model, Sj may equal only +1 or -1. Plugging this relation into Eq. (23), we get 37 



condensates may be readily placed in external potentials U(r) ^ const (e.g., those trapping the atoms), while in 
superconductors such potential profiles are much harder to create due to the screening of electric field by metals - 
see, e.g., EM Sec. 2.1. 

34 Such penetration had been discovered experimentally by L. Shubnikov in the mid-1950s, but its quantitative 
explanation had to wait until A. Abrikosov's work (based on the Ginzburg-Landau equations) published in 1957. 

35 See, e.g., M. Tinkham, Introduction to Superconductivity, 2 nd ed., McGraw-Hill, 1996. A short discussion of 
the Josephson effects may be found in QM Sec. 2.3 and EM Sec. 6.4. 

36 See, e.g., Sec. 45 in E. Lifshitz and L. Pitaevskii, Statistical Physics, Part 2, Pergamon, 1980. 

37 Since in this naive approach we neglect the thermal fluctuations of spin, i.e. their disorder, this assumption 
implies S = 0, so that F = E-TS = E, and we may use either notation for system's energy. 
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F = -(NJd)rj 2 -Nhri. (4.59) 

This energy is plotted in Fig. 7 a as a function of rj, for several values of h. The plots show that at 
h = 0, the system may be in either of two stable states, with rj = ±1, corresponding to two different 
directions of spins (magnetization), with equal energy. 38 (Formally, the state with rj = 0 is also 
stationary, because at this point dF/drj = 0, but it is unstable, because for the ferromagnetic interaction, 7 
> 0, the second derivative d 2 F/drf is positive.) 
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Fig. 4.7. Field dependence 
of (a) the free energy profile 
and (b) order parameter (i.e. 
magnetization) in the 
crudest mean-field approach 
to the Ising model. 



As the external field is increased, it tilts the potential profile, and finally at a critical field, 

h c = 2Jd , (4.60) 

the state with rj = -1 becomes unstable, leading to system's jump into the only remaining state with 
opposite magnetization, r/ = +1. Application of the similar external field of the opposite polarity leads to 
the similar switching back to r\ = -1, so that the full field dependence of rj follows the hysteretic pattern 
shown in Fig. 7b. Such a pattern is the most visible experimental feature of actual ferromagnetic 
materials, with the coercitive magnetic field ft{ (modeled with h c ) of the order of 10 A/m, and the 

saturated magnetization (modeled with rj = ±1) corresponding to much higher fields B - of the order of 
a few tesla. The most important property of these materials, also called permanent magnets, is their 
stability, i.e. the ability to retain the history-determined direction of magnetization in the absence of 
external field, for a very long time. In particular, this property is the basis of all magnetic systems for 
data recording, including the ubiquitous hard disk drives with their incredible information density - 
currently approaching 1 Terabit per square inch. 39 

So, this simplest mean-field theory gives a crude description of the ferromagnetic ordering, but 
grossly overestimates the stability of these states with respect to thermal fluctuations. Indeed, in this 



38 The fact that stable states always correspond to rj = ±1, partly justifies the treatment of rj as a continuous 
variable in this crude approximation. 

39 For me, it was always surprising how little physics students knew about this fascinating field of modern 
engineering, which involves so much interesting physics and fantastic electromechanical technology. For getting 
acquainted with it, I may recommend, for example, the monograph by C. Mee and E. Daniel, Magnetic Recording 
Technology, 2 nd ed., McGraw-Hill, 1996. 
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theory, there is no thermally-induced randomness at all, until T becomes comparable with the height of 
the energy barrier separating two stable states, 

AF = F(rj = 0) - F(rj = ±1) = NJd , (4.61) 

which is proportional to the number of particles. At N — » qo, this value diverges, and in this sense the 
critical temperature is infinite, while numerical experiments and more refined theories of the Ising 
model show that actually the ferromagnetic phase is suppressed at T c ~ Jd- see below. 40 

The mean-field approach may be dramatically improved by even an approximate account for 
thermally-induced randomness. In this approach, suggested in 1908 by P.-E. Weiss under the name of 
molecular-field theory* 1 random deviations of individual spin values from the lattice average, 



SjSSj-rj, V = {Sj) s (4.62) 

are allowed, but considered small, « tj . This assumption allows us, after plugging expression 
Sj=rj + ?. into the first term of the right-hand part of Eq. (23), 

E m =-JY{r J + s j lr } + s r )-h^s j , (4.63) 

UJ'} J 

ignore the term proportional to s -s ., . Making replacement (62) in the terms proportional to 7* , we get 

E^E^^iNJdy-h^Sj, (4.64) 

j 

where h e t is defined as the sum 

h ef =h + (2Jd}]. (4.65) 

The physical interpretation of /z e f is the effective external field, which (besides the real external 
field h) takes into account the effect that would be exerted on spin Sj by its 2d next neighbors, if they all 
had unperturbed (but possibly fractional) spins Sj> = tj. Such an addition to external field, 

Weiss 



molecular 
field 



h mo i=Kf -h = (ud)n, 



(4.66) 



is called the molecular field - giving its name to the theory. 

From the point of view of statistical physics, at fixed parameters of the system (including the 
order parameter rj), the first term in the right-hand part of Eq. (64) is merely a constant energy offset, 
and h e f is just another constant, so that 

\~h ef , for s , = +1, 

E .' = const + 2X., Ej =- Vj = [ +h l , ;c , | <« 7 > 



40 Actually, the thermal stability of many real ferromagnets, with longer-range interaction between spins, is higher 
than that predicted by the Ising model. 

41 In some texts, it is also labeled a mean-field theory. This terminology may lead to confusion, because the 
molecular-field theory is on a completely different level of phenomenology than, say, Landau's mean-field 
theory. For example, the Weiss theory may used for the calculation of parameters a, b, and T c participating Eq. 
(46), the starting point of Landau's theory, for the Ising model. 
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Such separability of energies means that in the Weiss approximation the spin fluctuations are 
independent, and their statistics may be examined individually, using energy spectrum Ej. But this is 
exactly the two-level system which was the subject of three exercise problems in Chapter 2. Actually, its 
statistics is so simple that it is easier to redo this fundamental problem starting from scratch, rather than 
to use the results of those exercises (which would require changing notation). Indeed, according to the 
Gibbs distribution (2.58)-(2.59), the equilibrium probabilities of states Sj = ±1 may be found as 



W + = 



1 +KJT 



ef" 



Z = exp< + ■ 



h 



ef 



+ exp 




(4.68) 



From here, we may readily calculate F = -TlnZ and other thermodynamic variables, but let us 
immediately use Eq. (68) to calculate the statistical average of Sj, i.e. the order parameter: 



+h ef /T 



(+l)W + +(-l)W_ 



2cosh(/-2 ef IT) 



tanh^ 
T 



(4.69) 



Now comes the main trick of the Weiss' approach: plugging this result back into Eq. (65), we 
may write the condition of self-consistency of the molecular field theory: 



h e{ - h = 2Jd tanh 



h 



ef 



Self- 

(4.70) consistency 
equation 



This is a transcendent equation that evades an explicit analytical solution, but its properties may be 
readily understood by plotting its both parts as functions of their argument, so that the stationary state(s) 
of the system corresponds to the intersection point(s) of these plots. 

First of all, let us explore the field-free case (h = 0), when h e f = h mo \ = 2dJrj, so that Eq. (70) is 
reduced to 



rj = tanh 



2Jd 



(4.71) 



giving one of the patterns sketched in Fig. 8, depending on the dimensionless parameter 2JdlT. 




Fig. 4.8. Ferromagnetic phase transition in 
Weiss' molecular-field theory: two sides of 
Eq. (71) plotted as functions of rj for 3 
temperatures: above T c (red), below T c 
(blue) and equal to T c (green). 



If this parameter is small, the right-hand part of Eq. (71) grows slowly with rj (red line in Fig. 8), 
and there is only one intersection point with the left-hand part plot, at jj = 0. This means that the spin 
system features no spontaneous magnetization - the so-called paramagnetic phase. However, if 
parameter 2Jd/T exceeds 1, i.e. T is decreased below the following critical value, 
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Critical 
("Curie") 
temperature 



2Jd. 



(4.72) 



the right-hand part grows, at small rj, faster than the left-hand part, so that their plots intersect it in 3 
points: 7 = 0 and rj = ±tjo. It is almost evident that the former stationary point is unstable while two 
latter points are stable. 42 Thus, below T c the system is in the ferromagnetic phase, with one of two 
possible directions of spontaneous magnetization, so that the critical (Curie) temperature, given by Eq. 
(72), marks the transition between the paramagnetic and ferromagnetic phases. (Since the stable 
minimum value of energy G is a continuous function of temperature at T = T c , this is the continuous 
phase transition.) 

Now let us repeat the same graphics to examine how each of these phases responds to external 
magnetic field h ^ 0. According to Eq. (70), the effect of h is just a shift of the straight line plot of its 
left-hand part - see Fig. 9. 



2dJ ' 
h<0/ 








V K, 

-2dJ 





(a) 



h<-h 




Fig. 4.9 External field effect on: 

(a) a paramagnet (T > T c ), and 

(b) a ferromagnet (T<T C ). 



In the paramagnetic case (Fig. 9a) the resulting dependence h s ^h) is evidently continuous, but 
the coupling effect (J > 0) makes it more steep than it would be without spin interaction. This effect 
may be characterized by the low-field susceptibility defined by Eq. (29). To calculate it, let us notice 
that for small h, and hence h e f, function tanh in Eq. (70) is approximately equal to argument, so that Eq. 
(70) becomes 

K~h = ^fh ef . (4.73) 

Solving this equation for /z e f, and then using Eq. (72), we get 

Kf = = • ( 4 -74) 

\-2JdlT I-TJT 

Recalling Eq. (66), we can rewrite this result for the order parameter, 



T-T 

meaning that the low-field susceptibility 



7 = %dl = * (4.75) 



42 This fact may be readily verified by using Eqs. (64) and (68) to calculate F. Now condition 8F/dr]\ h=0 = 0 
returns us to Eq. (71), and calculating the second derivative, for T<T c we get d 2 F/dr/ 2 > 0 at rj = ±r/ 0 (indicating 
two stable minima of F), and d 2 F/drj 2 < 0 at rj = 0 (the unstable maximum of F). 
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drj 



1 



Curie- 



x = 



dh 



h=0 



T-T c ' 



for T>T C . 



(4.76) Weiss 
v ' law 



This is the famous Curie-Weiss law, which shows that the susceptibility diverges at the approach to the 
Curie temperature T c . 

In the ferromagnetic case, the graphic solution (Fig. 9b) of Eq. (70) gives a qualitatively different 
result. A field increase leads, depending on the spontaneous magnetization, either the further saturation 
of h mo \ (with the order parameter rj gradually approaching 1), or, if the initial rj was negative, a jump to 
positive rj at some critical (coercitive) field h c . In contrast with the crude mean-field approximation (59), 
at T > 0 the coercitive field is smaller than that given by Eq. (60), and the magnetization saturation is 
gradual, in a good (semi-qualitative) accordance with experiment. 

To summarize, the Weiss' molecular-field theory gives a more realistic description of the 
ferromagnetic and paramagnetic phases in the Ising model, and a very simple prediction (72) of the 
temperature of the phase transition between them, for an arbitrary dimensionality d of the cubic lattice. 
It also allows finding all other parameters of the mean-field theory for that model - an easy exercise left 
for the reader. 



In order to evaluate the main prediction (72) of the Weiss theory, let us now discuss the exact 
(analytical) and quasi-exact (numerical) results obtained for the Ising model, going from the lowest 
dimensionality d = 0 to its higher values. 

Zero dimensionality means that a spin has no nearest neighbors at all, so that the first term of Eq. 
(23) vanishes. Hence Eq. (64), with h e f = h, is exact, and so is its solution (69). Now we can repeat the 
calculations that have led us to Eq. (76), with 7=0, i.e. T c = 0, and reduce this result to the so-called 
Curie law: 

X = \- (4.77) 

It shows that T c = 0, i.e. the system is paramagnetic at any temperature. One may say that for this case 
the Weiss molecular field theory is exact - or in some sense trivial, because it provides an exact, fully 
quantum-mechanical treatment of spin- 1 /^ particles at negligible interaction. Experimentally, the Curie 
law is approximately valid for many so-called paramagnetic materials, i.e. 3D systems with a weak 
interaction between particle spins. 

The case d = 1 is more complex, but has an exact analytical solution. Probably the simplest way 
to obtain it is to use the so-called transfer matrix approach.^ For this, first of all, we may argue that 
properties of a ID system of iV » 1 spins (say, put at equal distances on a straight line) should not 
change noticeably if we bend that line gently into a closed loop (Fig. 10), i.e. assume that spins s\ and sn 
form one more pair of next neighbors, giving one more contribution, -Js\s^, to energy (23): 



43 It was developed in 1 94 1 by H. Kramers and G. Wannier. Note that the approach is very close to the one used 
in ID quantum mechanics - see, e.g., QM Sec. 2.5. 



4.5. Ising model: Exact and numerical results 



E m = -(/y,^ + Js 2 s 3 + ... + Js N s l 



)-(hs l +hs 2 + ... + hs N ) . 



(4.78) 
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Let us regroup terms of this sum in the following way: 



E_ = - 



h 



h 



5j + Js l S 2 + S 2 

K l 1 J 



f 



+ 



h 



h 



v2 2 j 



r 



+ ...+ 



h 



h 



s N + Js N s { + s l 



(4.79) 



so that the group in each parentheses depends only on the state of two adjacent spins. The corresponding 
statistical sum, 



Z= V Qxp \h^ + J^ + h^\ Q xp\h^ + J S -^ + h^L Q xp\h^ + J 
s .^t [ 2T T 2T} { 2T T IT) [ IT T 

j=\,2,...N 



IT 



(4.80) 



has 2 N terms, each corresponding to a certain combination of signs of spins. Each operand of the 
product under the sum may take 4 values for 4 different combinations of its two arguments: 



, S J r S J S J^ , 
2T T 2T 



exp{(y + h)l T}, for Sj = s j+l = +1, 
exp{(7 -h)IT\ for s } = s j+i = -1, 
exp{- J IT], for Sj = -Sj 



'j+i- 



(4.81) 




Fig. 4. 10. ID Ising system on a 
circular loop. 



These values do not depend on index j, 44 and may be presented as elements of the so-called 
transfer matrix 

r Qxp{(j + h)/T] exp{-7/T} ^ 
exp{-7/T} Qxp{(j -h)/T}j 



M = 



(4.82) 



and the whole statistical sum may be recast as a product: 

Z= X M s lS M s 



(4.83) 



Sj =±i 

j=l,2,-N 



According to the basic rule of matrix multiplication, this sum is just 



44 This is of course a result of the "translational" (or rather rotational) symmetry of the system, i.e. its invariance 
to the index replacement j — > j +1 in all terms of energy H m (besides index N which should be replaced with 1). 
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(4.84) 



Z = Tr(M N ). 

Matrix algebra tells us that this trace may be presented just as 

Z = X N + +X N _ (4.85) 

where X± are the eigenvalues of the transfer matrix M, i.e. the roots of its characteristic equation, 

exp {(J + h)/T } - X exp {- J/T } 
exp {- J/T } exp {(j - h)/T }-A 

A straightforward calculation yields 



= 0 



(4.86) 



A ± = exp- 



T 



cosh— + 
T 



. u2 h [47 
sinh — h exp< 



(4.87) 



Now the last simplification comes from condition iV » 1 - which we needed anyway, to make 
the loop model equivalent to an in infinite ID system. In this limit, even a small difference of exponents, 
A+ > X., makes the second term in Eq. (85) negligible, so that we finally get 



Z = X N + = exp- 



NJ 
T 



cosh — h 
T 



sinh — hexp< 



v 



47 
T 



,1/2 



(4.88) 



From here, we can find the free energy per particle 



F T , 1 

— = — In — = 
N N Z 



-J -Tin 



i h 
cosh — h 

T 



sinh 2 — + exp 





1/2 







(4.89) 



and hence can calculate all variables of interest from thermodynamic relations. In particular, the 
equilibrium value of the order parameter may be found from the last of Eqs. (1.39), with the 
replacements discussed above: G — > F, P — > -h, and hence V = (8G/8P)t — > -(8F/dh) T = Nrj. For low 
fields (h « T), this formula yields 



h 

7 = -exp 



27 
I T 



This result describes linear magnetization with the following low-field susceptibility, 



Z = 



drj 
~dh 



27 
T 



(4.90) 



(4.91) 



and means that the ID Ising model does not exhibit a phase transition, i.e., T c = 0. However, its 
susceptibility grows, at T — > 0, much faster than the Curie law (77). This gives us a hint that at low 
temperatures the system is "virtually ferromagnetic", with has the ferromagnetic order with some rare 
violations. (In physics, such violations are called low-temperature excitations) This perception may be 
confirmed by the following approximate calculation. 

It is almost evident that the lowest-energy excitation of a ID ferromagnet at h = 0 is the reversal 
of signs of all spins in one of its parts (Fig. 11). Indeed, since such excitation (called the Block wall) 
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involves the change of sign of just one product SjSf, according to Eq. (78), its energy E w (defined as the 
difference between values of E m with and without the excitation) equals 2J, regardless of the wall 
position. Since in a ferromagnet, parameter J is positive, E w > 0. If the system tried to minimize its 
potential energy, having any wall in the system would be energy-disadvantageous. However, 
thermodynamics tells us that at finite T, system's equilibrium corresponds to the minimum of free 
energy rather than just energy. 45 Hence, we have to calculate Bloch wall's contribution Ew to the free 
energy. Since in a linear chain of N » 1 spins, the wall can take (N — \) & N positions with the same 
energy E w , we may claim that the entropy Sw associated with an excitation of this type is ln/V, and its 
according to definition (1.33) of the free energy, 

F w =E w -TS w «2J-T\nN. (4.92) 



Fig. 4.1 1. A Bloch wall in a ID Ising 
system. 



This result tells us that in the limit N — » oo, and at T ^ 0, walls are always free-energy-beneficial, 
thus explaining the absence of the perfect ferromagnetic order in the ID Ising system. Note, however, 
that since the logarithm grows extremely slowly at large values of its argument, one may argue that a 
large but finite ID system would still feature a quasi-critical temperature 

'T c "=— , (4.93) 
c ln/V 

below which it would feature a virtually complete ferromagnetic order. (The exponentially large 
susceptibility (91) is a manifestation of this fact.) 

Now let us apply a similar approach to estimate T c of a 2D Ising model. Here the Bloch wall is a 
line of certain length L - see Fig. 12. (For this example, counting from the left to the right, L = 2 + 1 + 4 
+ 2 + 3 = 12 lattice periods.) 



Fig. 4.12. A Bloch wall in a 2D Ising system. 



Evidently, the additional energy associated with such wall is £w = 2JL, while wall's entropy may 
be estimated approximately using the following reasoning. Let the wall be formed by the path of a 
"Manhattan pedestrian" traveling through the lattice between its nodes. At each junction, the pedestrian 



45 If the reader is still uncomfortable with this core result of thermodynamics, he or she is strongly encouraged to 
revisit Eq. (1.43) and its discussion. 
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may select 3 choices of 4 directions (except the one that leads backward), so that there are 
approximately 3 (L_1) options for a walk starting from a certain point, i.e. approximately M ~ 2(N - 

1 /2 L 1 /2 L 1 /2 

1) x3 ~ 2N 3 different walks starting from two sides of a square-shaped lattice (of linear size 
Again calculating Sw as InM, we get 

F w =E W -TS W *2JL-Tln(2N vl x3 L )= L(2J -Tln3)-Tln(2N V2 ) . (4.94) 

1/2 

Since L scales as N or higher, at N — > oo the last term is negligible, and we see that sign of dF w IdL 
depends on whether the temperature is higher or lower than the following critical value 

T c =— 7*1.82 7. (4.95) 
In 3 

At T < T c , the Free energy minimum corresponds to L — > 0, i.e. Bloch walls are free-energy-beneficial, 
and the system is in the ferromagnetic phase. 

So, for d = 2 the estimates predict a finite critical temperature of the same order as the Weiss' 
theory (T c = 47). The major approximation in the calculation leading to Eq. (95) is disregarding possible 
self-crossing of the "Manhattan walk". An accurate counting of such self-crossings is rather difficult. It 
had been carried out in 1944 by L. Onsager; since then his calculations have been redone in several 
easier ways, but even they are rather cumbersome, and I will not have time to discuss then in detail. 46 
The final result, however, is surprisingly simple: 

tanh — = V2 - 1, giving T c ~ 2.269 7 , 



i.e. showing that the simple estimate (95) is only -20% off the mark. 

The Onsager solution, as well as all alternative solutions of the problem that were found later, 
are so "artificial" (2D-specific) that they do not give a clear clue to their generalization to other (higher) 
dimensions. As a result, the 3D Ising problem is still unsolved analytically. Nevertheless, we do know 
T c for that case with an extremely high precision - at least to the 6 th decimal place. This has been 
achieved by numerical methods; they deserve a thorough discussion, are applicable to other problems as 
well. Conceptually, this task is rather simple: just compute, to the desired precision, the statistical sum 
of system (23): 

■',=±1 I 1 U,f) 1 J J 

;'=l,2,...,iV 

As soon as this has been done for a sufficient number of values of dimensionless parameters JIT and hlT, 
everything else is easy; in particular, we can compute the dimensionless function 

F/r = -lnZ, (4.98) 

and then find the ratio JIT C as the smallest value of parameter JIT, at that FIT (as a function of ratio h/T) 
has a minimum at zero field. However, for any system of a reasonable size N, the "exact" computation 
of the statistical sum (97) is impossible, because it contains to many terms for any supercomputer to 



46 For that, the reader is referred to either Sec. 15 1 in the textbook by Landau and Lifshitz or Chapter 15 in the 
text by Huang, both cited above. 



(A Qf,\ Onager's 
(4.96) result 
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handle. For example, let us take a relatively small 3D lattice with N = 10x10x10 = 10 spins, which still 
feature substantial boundary effects even using the periodic boundary conditions (similar to the Born- 
Karman conditions in the wave theory), so that its phase transition is smeared about T c by ~ 1%. Still, 
even for that crude model, Z would include 2 1 ' 000 = (2 10 ) 100 » (10 3 ) 100 = 10 300 terms. Let us suppose we 

1 8 

are using a prospective exaflops-scale computer performing 10 floating-point operations per second, 
i.e. -TO 2 such operations per year. With those resources, the computation of just one statistical sum 



would require 



10 



(300-26) 



10 274 years. To call such number "astronomic" would be a strong 



understatement. (As a reminder, the age of our Universe is believed to be close to 1.3x10 
very humble number in comparison.) 

This situation may be improved dramatically by noticing that any statistical sum, 



10 



years 



Z =X ex P 



(4.99) 



is dominated by terms with lower values of E m . In order to find those lowest-energy states, we may use 
the following powerful approach (belonging to a broad class of Monte-Carlo techniques), which 
essentially mimics one (randomly selected) path of system's evolution in time. One could argue that for 
that we would need to know the exact laws of evolution of statistical systems, 47 that may differ from one 
system to another, even if their energy spectra E m are the same. This is true, but since the equilibrium 
value of Z should be independent of these details, it may be evaluated using any kinetic model, provided 
that it satisfies certain general rules. In order to reveal these rules, let us start from a system with just 
two states, E m and E m > = E m + A - see Fig. 13. 




E „ = E „ + A 



m m 



Fig. 4.13. Deriving the 
detailed balance equation. 



In the absence of quantum coherence between the states (see Sec. 2.1), equations for time 
evolution of the corresponding probabilities W m and W m - should depend only on the probabilities (plus 
certain constant coefficients). Moreover, since equations of quantum mechanics are linear, the equations 
of probability evolution should be also linear. Hence, it is natural to expect them to have the following 
form, 



Master 
equations 



dt 



=w_,r, 



W m T r , 



dW, r 
dt 



= w m T,-w m ,r l , 



(4.100) 



where constant coefficients T| and F± have the physical sense of rates of the corresponding transitions - 
see Fig. 13. According to the master equations (100) the rates have simple meaning: for example, Ttdt 
is the probability of the system's transition into state m' during an infinitesimal time interval dt, 
provided that in the beginning of that interval it was in state m with full certainty: W m = 1, W m - = 0. 48 



47 Discussion of such laws in the task of physical kinetics, which will be briefly reviewed in Chapter 6. 

48 The calculation of these rates for several particular cases is described in QM Sees. 6.6, 6.7, and 7.6. 
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Since for the system with just two energy levels, the time derivatives of the probabilities are 
equal and opposite, Eqs. (100) describe an (irreversible) redistribution of the probabilities while keeping 
their sum W = W m + W m > constant. At t — > qo, dldt — > 0, and the probabilities settle to their stationary 
values related as 



w m r ; ' 

Now let us require that these stationary values obey the Gibbs distribution (2.58); then 



(4.101) 



W , \E -E ,\ \ A I 

= exp {^Vj = exp i t\ < 1 • (4> 102) 

Comparing these two expressions, we see that the rates have to satisfy the following detailed balance 
relation 




Detailed 
(4.103) balance 
relation 



By the way, this relation may serve as an important sanity check: the rates calculated using any 
reasonable model of a quantum system have to satisfy it. 49 

Now comes the final argument: since the rates of transition between two particular states should 
not depend on other states and their occupation, Eq. (103) has to be valid for each pair of states of any 
multi-state system. The detailed balance yields only one equation for two rates T\ and Ti; if our only 
goal is the calculation of Z, the choice of the other equation is not too important. Perhaps the simplest 
choice is 

/ \ / \ f 1, if A < 0, 

r(A)oc r (A)= ' ' (4.104) 

[exp(-A/7|, otherwise, 

where A is the energy change resulting from the transition. This model, which evidently satisfies the 
detailed balance relation (103), is the most popular for its simplicity, despite the fact that this function Y 
(A) has an unphysical cusp at A = 0. The simplicity of Eq. (104) enables the following Metropolis 
algorithm (Fig. 14). The calculation starts from setting a certain initial state of the system. At relatively 
high temperatures, the state may be generated randomly; for example, for the Ising system, the initial 
state of each spin Sj may be selected independently, with the 50% probability. At low temperatures, 
starting the calculations from the lowest-energy state (in particular, for the Ising model, from the 
ferromagnetic state sj = sgn(/z) = const) may give the fastest convergence of the sum (97). 

Now one spin is flipped at random, and the corresponding change of energy (A) is calculated, 50 
and plugged into Eq. (104) to calculate y(A). Next, a pseudo-random number generator is used to 
generate a random number £ with the probability density uniformly distributed on segment [0, 1]. (Such 



49 See, e.g., QM Eq. (7.196) for a quantum system bilinearly coupled to an environment in thermal equilibrium. 
By the way, that formula (as well as results for all realistic physical systems) does not feature the unphysical cusp 
of function T(A) at A = 0, assumed by the popular model (104). 

50 Note that the flip changes signs of only (2d +1) terms in sum (23), i.e. does not require re-calculation of all (2d 
+l)N terms of the sum, so that the computation of A takes just a few add-multiply operations even at N » I. 
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functions, typically called RND, are available in virtually any numerical library.) If the resulting £ is 
less than the transition is accepted, while if £,< y(A), it is rejected. In the view of Eq. (104), this 
means that any transition down the energy spectrum (A < 0) are always accepted, while those up the 
energy profile (A > 0) are accepted with the probability proportional to exp{-A/7}. The latter feature is 
necessary to avoid system trapping in local minima of its multidimensional energy profile 
E m (s\,S2,...,SN). Now the statistical sum may be calculated approximately as a partial sum over the states 
passed by the system. (It is better to discard the contributions from a few first steps to avoid an error due 
to the initial state choice.) 



set up an initial state 



flip a random spin 

- calculate A 
- calculate /(A) 



generate random £, 
(0<<f<l) 



reject 
spin flip 



7 <Z 




7>Z 



accept 
spin flip 



Fig. 4.14. Crude scheme of the 
Monte Carlo algorithm for the 
Ising model simulation. 



This algorithm is extremely efficient. Even with modest computers available in the 1980s, it has 
allowed to simulate a 3D Ising system of (128) spins to get the following result: JIT C « 0.221650 ± 
0.000005. For all practical purposes, this result is exact (so that perhaps the largest benefit of the 
possible analytical solution for the infinite 3D Ising system would be a virtually certain Nobel Prize for 
the author :-). Table 2 summarizes values of T c for the Ising model. Very visible is the fast improvement 
of prediction accuracy of the molecular-field theory - which is asymptotically correct at d — > oo. 



Table 2. Critical temperature T c (in the units of J) of the Ising model 
of a ferromagnet (J > 0) for several values of dimensionality d 



d 


Molecular-field theory - Eq. (72) 


Exact value 


Exact value's source 


0 


0 


0 


Gibbs distribution 


1 


2 


0 


Transfer matrix theory 


2 


4 


2.269... 


Onsager's solution 


3 


6 


4.513... 


Numerical simulation 
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Finally, I need to mention the renormalization- group ("RG") approach, 51 despite its low 
efficiency for the Ising problem. The basic idea of this approach stems from the scaling law (30)-(31): at 
T = T c the correlation radius r c diverges. Hence, the critical temperature may be found from the 
requirement for the system to be spatially self-similar. Namely, let us form larger and larger groups 
("blocks") of adjacent spins, and require that all properties of the resulting system of the blocks 
approach those of the initial system, as T approaches T c . 

Let us see how does this idea work for the simplest nontrivial (ID) case, which is described by 
statistical sum (80). Assuming N to be even (which does not matter at N — > oo), and adding an 
inconsequential constant C to each exponent (for the purpose that will be clear later on), we may rewrite 
this expression as 



Z=£ n exp 

Sj =±\ j=\,2,...N I 



h J h 

-S , + —S ;S + S ;m + C 



2T 1 T 



2T 



(4.105) 



Let us group each two adjacent exponents to recast this expression as a product over only even numbers 



Z = Z EI ex P 

s.=±\ j=2,4,...N 



f h \J ( 

[2T 1 J [_T ' 



\ h 



+ — 5..,, + 2C 



IT 



(4.106) 



and carry out the summation over two possible states of the internal spins sj explicitly: 



z= z n 



Sj =±l j=2,4,-N 
(for odd j) 



{ h J ( \ h h 

\ — s j _ l +-{s j _ l +s j J + - + 



+ exp 



f h J i \ h 

\—s j _ l --[s j _ l +s j+l )-- + 



T 2T 
h h 



+ 2C 



[2T 



T 2T 



+ 2C 



(4.107) 



= z n^ 2cosh {^(v 1 + ^) + |} e ^^(v 1 + ^i) +2c j 



(for odd j) 



Now let us require this statistical sum (and hence all statistical properties of the system of 2-spin 
blocks) to be identical to that of the Ising system of N/2 spins, numbered by odd j: 



z = s n ex p{f 



(4.108) 



Sj=±l j=2,4,...,N 
(for odd Sj) 



with some different parameters h', J', and C, for all 4 possible values of Sj.\ = ±1 and Sj+\ = ±1. Since 
the right-hand part of Eq. (107) depends only on the sum (sj-i + Sj+\), this requirement yields only 3 
(rather than 4) independent equations for finding h', J', and C Of them, equations for h' and J' depend 
only on h and J (but not on C), 52 and may be presented in an especially simple form, 



51 Developed first in the quantum field theory in the 1950s, it was adapted to statistics by L. Kadanoff in 1966, 
with a spectacular solution of the so-called Kubo problem by K. Wilson in 1972, later awarded by a Nobel Prize. 

52 This might be expected, because physically C is just a certain constant addition to system's energy. However, 
the introduction of that constant is mathematically necessary, because Eqs. (107) and (108) may be reconciled 
only if C'±C. 
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RG 

equations 
for 1 D Ising 
model 



X = 



{x+y){\ + xy)' 



y(x + y) 

l + xy 



(4.109) 



using notation 



x = Qxp< 



y = exp 



(4.110) 



Now the grouping procedure may be repeated, with the same result (109)-(1 10). Hence these 
equations may be considered as recurrent relations describing repeated doubling of the spin block size. 
Figure 15 shows (schematically) the trajectories of this dynamic system on the phase plane [x, y]. (A 
trajectory is defined by the following property: for each of its points {x, y}, the point {x\ y'} defined by 
the "mapping" Eq. (109) is also on the same trajectory.) For ferromagnetic coupling (7 > 0) and h > 0, 
we may limit the analysis to the unit square 0 < x, y < 1 . If this flow diagram had a stable fixed point 
with x' = x = Xoo ^ 0 (i.e. 777 < oo) and y' = y = 1 (i.e. h = 0), then the first of Eqs. (110) would 
immediately give us the critical temperature of the phase transition in the field-free system: 



T = 



47 



ln(l/xj 



(4.111) 



However, Fig. 15 shows that the only fixed point of the ID system is x = y = 0, which (at finite coupling 
7) should be interpreted as T c = 0. This is of course in agreement with the exact result of the transfer- 
matrix analysis, but does not give any additional information. 



y = exp{-|^ 



T = 0 




exp {-47 IT} 



Fig. 4.15. The RG flow 
diagram of the ID Ising 
system (schematically). 



Unfortunately, for higher dimensionalities the renormalization-group approach rapidly becomes 
rather cumbersome, and requires certain approximations, whose accuracy cannot be easily controlled. 
For 2D Ising system, such approximations lead to the prediction 7 c /7 « 2.55, i.e. to a substantial 
difference from the exact (Onsager's) result. 



4.6. Exercise problems 

4.1 . Compare the third virial coefficient C(7) for the hard-core model of particle interactions, 
that follows from the van der Waals equation, with the exact result (whose calculation was the subject of 
Exercise 3.9). 
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4.2 . Calculate the internal energy for the van der Waals model, and discuss the result. 

4.3 . Derive as many analytical results as you can for temperature dependence of the phase- 
equilibrium pressure Po(T) and the latent heat A(7) within the van der Waals model. In particular, 
explore the low-temperature limit (T « T c ), and the close vicinity of the critical point T c . 

4.4 . Use the Clapeyron-Clausius formula (4.17) to calculate the latent heat A of the Bose- 
Einstein condensate, and compare the result with that obtained in Exercise 3.6. 

4.5 . In Sec. 4, we have discussed Weiss' molecular-field approach to the Ising model, in which 
the lattice average (sj) plays the role of the order parameter rj. Use the results of that analysis to find 
coefficients a and b in the corresponding Landau expansion of the free energy. List the values of critical 
exponents a and f3 within this approach. 

4.6 . For a two-site Ising system with energy values 

E m =~Js x s 2 -h(s x +s 2 ), 

in thermal equilibrium, find the low-field susceptibility x= d(s)/dh\k=o. Explore the low-temperature and 
high-temperature limits of the result, and give physical interpretations of its asymptotic behaviors. 

4.7 . Use Eq. (88) to calculate the average energy, free energy, entropy and heat capacity (all per 
lattice site), as functions of temperature T and field h, for the ID Ising model. Sketch the temperature 
dependence of the heat capacity for various values of ratio hi J, and give a physical interpretation of the 
result. 
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Chapter 5. Fluctuations 

This chapter discusses fluctuations of statistical variables, mostly at thermodynamic equilibrium. In 
particular, I will describe the intimate connection between fluctuations and dissipation {damping) in a 
dynamic system weakly coupled to a multi-particle environment, which culminates in the Einstein 
relation between the diffusion coefficient and mobility, the Nyquist formula, and their quantum- 
mechanical generalization - the fluctuation-dissipation theorem. An alternative approach to the same 
problem, based on the Smoluchowski and Fokker-Planck equations, is also discussed in brief. 



5.1. Characterization of fluctuations 



In the beginning of Chapter 2, we have discussed the notion of averaging, (J), of a variable / 
over a statistical ensemble - see Eqs. (2.7) and (2.10). Now, the variable's fluctuation may be defined 
simply as its deviation from the average: 



Fluctuation 



/-/-(/); 



(5.1) 



this deviation is, evidently, also a random variable. The most important property of any fluctuation is 
that its average (over the same statistical ensemble) equals zero: 

f) = (/-(/)) = (/)-((/)) = (/)-(/) = 0- 



(5.2) 

As a result, such average cannot characterize fluctuations' intensity, whose simplest characteristic is the 
variance (also called "dispersion"): 



Variance: 
definition 



7 2 H(/-</» 2 



The following simple property of the variance is frequently convenient for its calculation: 

(7 2 ) = ((/ - (f)) 2 ) = (f 2 - 2f(f) + (fY) = (f 2 )- 2(ff + (f) 2 , 

so that, finally: 



Variance 
via 

averages 



7 2 ) = (/ 2 )-(/> 2 



(5.3) 



(5.4a) 



(5.4b) 



As the simplest example of its application, consider a variable which can take only two values, ±1, with 
equal probabilities Wj= Vz. For such a variable, 



f} = ZW.f J =\( + l) + \(-l) = 0, but (/ 2 ) = E^ 2 =j(+i) 2 +^-(-i) 2 =i, 



(5.5) 



so that If )= f 2 )-(/) =1 



R.m.s. 
fluctuation 



The square root of variance, 



1/2 



(5.6) 
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is called the root-mean-square (r.m.s.) fluctuation. An advantage of this measure is that it has the same 
dimensionality as the variable itself, so that ratio 8fl(J)'\s dimensionless, and may be used to characterize 
the relative intensity of fluctuations. In particular, as has been mentioned in Chapter 1, all results of 
thermodynamics are valid only if the fluctuations of thermodynamic variables (internal energy E, 
entropy S, etc.) are relatively small. 1 Let us make the simplest estimate of the relative intensity of 
fluctuations by considering a system of TV independent, similar particles, and an extensive variable 

? = £fj- (5-7) 

7=1 

where fl depends on the state of just one (/ ) particle. The statistical average of ^is evidently 

?) = iL(f) = N (f)> ( 5 - 8 ) 



H 



while the variance is 



H - (Z/yZ/, - Zfjfr - Z (fjf,) ■ < 5 - 9 > 

\7=1 7=1 / \7,7'=1 / 7,7=1 

Now we may use the fact that for two independent variables 

[fJr) = 0: ( 5 - 10 ) 



actually, this equation may be considered as a mathematical definition of the independence. Hence, in 
the sum (9), only the terms with j ' —j survive, and 

TV 
7,7"=1 



Comparing Eqs. (8) and (1 1), we see that the relative intensity of fluctuations of variable f, 




Relative 
(5.12) fluctuation 



estimate 



tends to zero as the system size grows (N — > qo). It is this fact that justifies the thermodynamic approach 
to typical physical systems, with the number N of particles of the order of the Avogadro number Na ~ 
10 24 . Nevertheless, in many situations even small fluctuations of thermodynamic variables are 
important, and in this chapter we will calculate their basic properties, starting from the variance. 

It will be pleasant for the reader to notice that for some simple (but important) cases, such 
calculation has already been done in our course. For example, for any generalized coordinate qj and 
generalized momentum pj that give quadratic contributions to system's Hamiltonian (2.46), we have 
derived the equipartition theorem (2.48), valid in the classical limit. Since the average values of these 



1 Let me remind the reader that up to this point, the averaging signs (...) were dropped in most formulas, for the 
sake of notation simplicity. In this chapter I have to restore these signs to avoid confusion. The only exception 
will be temperature whose average, following (bad :-) tradition, will be still call T everywhere besides the last part 
of Sec. 3 where temperature fluctuations are discussed explicitly. 
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variables, in the thermodynamic equilibrium, equal zero, Eq. (6) immediately yields their r.m.s. 
fluctuations: 



dp j =(mT) 1 ' 2 , Sqj = 



\mco ) 



(5.13) 



The generalization of these classical relations to the quantum-mechanical case (T ~ tico) for a ID 
harmonic oscillator is provided by Eqs. (2.78) and (2.81): 



timco , tico 

coth — 

2 IT 



1/2 



, 8q. = 



h , hco 

coth — 

2m co IT 



1/2, 



(5.14) 



However, the intensity of fluctuations in other systems requires special calculations. Moreover, 
only a few cases allow for general, model-independent results. Let us review some of them. 



5.2. Energy and the number of particles 

First of all, note that fluctuations of macroscopic variables depend on particular conditions. 2 For 
example, in a mechanically- and thermally-insulated system, e.g., a member of a microcanonical 
ensemble, there are no fluctuations of internal energy: 5E = 0. 

However, if a system is in a thermal contact with environment, for example is a member of a 
canonical ensemble (Fig. 2.6), the Gibbs distribution (2.58)-(2.59) is valid. We already know that 
application of this distribution to energy itself, 



{E) = JW m E m , pr m =Iex P j-^j, Z = £expj 



T 



yields Eq. (2.61b), which may be rewritten in the form 

1 dZ 



1 



(E) = —-^—, with^ = 
x 1 Zd(-J3) T 



(5.15) 



(5.16) 



2 

more convenient for our current purposes. Now let us carry out a similar calculation for variable E : 

(E 2 ) = ^W m E 2 m =^Z E >M-PEj. (5.17) 



(5.18) 



It is straightforward to check, by double differentiation, that this expression may be rewritten as 



-£exp{- = 



i d 2 z 



zd(-p) 2 ^ ^ ^ m) zd{-p) 2 ' 

Now it is straightforward to use Eq. (4) to calculate the energy fluctuation variance: 



E 2 ) = (E 2 



i d 2 z l 


f dz ^ 


2 

d 


f \ dz ^ 


8(E) 


z d(-/?) 2 z 2 






V Z d(-P)j 





(5.19) 



2 Unfortunately, even in some popular textbooks, a few formulas pertaining to fluctuations are either incorrect, or 
given without specifying the conditions of their applicability, so that reader's caution is advised. 
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Since Eq. (15) is valid only if system's volume V is fixed, it is customary to rewrite this extremely 
simple and important result as follows: 



E 2 = 



BE 



d(-l/T) 



= r 



BE 



8T 



= C r T' 



Jv 



(5.20) 



Fluctuations 
of £ 



This is a remarkably simple, fundamental result. As a sanity check, for a system of N similar, 

1/2 1/2 

independent particles, (E) and hence Cy and are proportional to N, so that SE <x N and 8EI{E) <x j\T , 
in agreement with Eq. (12). Let me emphasize that the classically-looking Eq. (20) is based on the 
general Gibbs distribution, and hence is valid for any system - either classical or quantum. 

We will discuss the corollaries of this result in the next section, and now let me carry out a very 
similar calculation for a system whose number TV of particles in a system is not fixed, because they may 
go to, and come from the environment at will. If the chemical potential ju of the environment and its 
temperature T are fixed, we are dealing with the grand canonical ensemble (Fig. 2.13), and may use the 
grand canonical distribution (2.106)-(2.107): 



Z G =Z eX P' 



VN-E„ 



(5.21) 



N,m 



Acting exactly as we did above for energy, we get 



N 



N 2 ) = 



-?-2>exp 



mN-e„ 



' G m,N 



-i> 2 H 

' G m,N ^ 



\HN-E„ 



T dZ G 
Z G 8{i ' 

_ T 2 d 2 Z c 
Z G djj. 2 



(5.22) 



(5.23) 



so that the particle number variance is 




(5.24) 



Fluctuations 
of N 



in the full analogy with Eq. (19). 

For example, for the ideal classical gas we had Eq. (3.32). As was already emphasized in Sec. 
3.2, though that result has been obtained from the canonical ensemble in that the number of particles N 
is fixed, at iV » 1 the fluctuations of TV in the grand canonical ensemble should be relatively small, so 
that the same relation should be valid for average (N) in that ensemble. Solving that relation for (N), we 
get 



N 



const x ex P j~~ 



(5.25) 



where "const" means a factor that is constant at the differentiation of (AO over ju, required by Eq. (24). 
Performing the differentiation and then using Eq. (25) again, 



d/u 



— exp<Sr = 



= const x — exp<; 

T [T 



N 



(5.26) 
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we get from Eq. (24) a surprisingly simple result: 



Fluctuations 
of N in 
classical gas 



N 2 ) = (N\ i.e.SN = (N 



1/2 



(5.27) 



This relation is so simple and important that I will now show how it may be derived in a different 
way, in order to prove that this result is valid for systems with an arbitrary (say, small) N, and also get 
more detailed information about the statistics of fluctuations of that number. Let us consider an ideal 
classical gas of No particles in a volume Vo, and calculate the probability Wn to have exactly N < No of 
these particles in a part V < Vo of this volume - see Fig. 1. 



V,N::l 



V&rNa- 



Fig. 5.1. Deriving the binomial 
and Poissonian distributions. 



For one particle such probability is of course W = VIVo < 1, while the probability of one particle 
being in the remaining part of the volume is W'=\-W=\ - VIVo. If all particles were distinguishable, 
the probability of having N<No specific particles in volume V, and (N- No) specific particles in volume 
(V - Vo), would be W^W'^^. However, if we do not distinguish the particles, we should multiply the 
probability by the number of possible particle combinations keeping numbers N and jV 0 constant, i.e. by 
the binomial coefficient No\IN\(No - N)\ ? As the result, the required probability is 



Binomial 
distribution 



w N = w N w (N "- N) 



N n 



Nl(N 0 -N)\ 



1 



N 



N„-N 



N 



N n 



o J 



N\(N 0 -N)\ 



(5.28) 



where in the second instance I have used the evident expression (N) = WNq = (VIVq)Nq for the average 
number of particles in volume V. Relation (28) is the so-called binomial probability distribution, valid 
for any (N) and A^o- 

If we are interested in keeping (N) arbitrary, but do not care how large the additional volume (Vo 
- V) is, we can simplify the binomial distribution by assuming that the external part, and hence iVo, are 
very large: 



N Q »N, 



(5.29) 



where TV means all values of interest, including (N). In this limit we can neglect TV in comparison with iVo 
in the second exponent of Eq. (28), and also approximate the fraction No\/(No - N)l, i.e. the product of 
terms, (N 0 - N + 1) (N 0 - N + 2)...(N 0 - 1)N 0 , as just N 0 N . As a result, we get 



W « 

N 



n N)^ N ' 



N, 



o J 



1- 



N 



\«0 



N 



o J 



N N 
iV o 

N\ 



N 



N\ 



1 



N 



\/v 0 



N. 



N 



o J 



N\ 



{1-W) 



{N} 



(5.30) 



3 See, e.g., MA Eq. (2.2). 
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In the limit (29), W ^> 0, and factor inside the square brackets tends to lie, the reciprocal of the natural 
logarithm base. 4 Thus, we finally get an expression independent of No: 




Poisson 
(5.31) distribution 



This is the much celebrated Poisson distribution, which describes a very broad family of random 
phenomena. Figure 2 shows this distribution for several values of (N) - which, in contrast to N, are not 
necessarily integer. 




Fig. 5.2. The Poisson distribution for 
several values of (N). In contrast to 
that average, argument N may take 
only integer values, so that lines are 
only guides for the eye. 



At very small (N), function Wn(N) distribution is close to an exponential one, W N a oc {Nf, 
while in the opposite limit, (N) » 1 , it rapidly approaches the Gaussian (alternatively called "normal") 
distribution 



W = 

N 



1 



{2x) V2 8N 



-expi 



(N-(N) ) 
2{SN) 2 



.2 1 



with SN = (N 



1/2 



, Gaussian 
yj.51) distribution 



(Note that the Gaussian distribution is also valid if both TV and A^ 0 are large, regardless of relation (29) 
between them - see Fig. 3.) 



Binomial distribution 
Eq. (28) 


N«N 0 


Poisson distribution 
Eq. (31) 





1« N,N„ 



Gaussian distribution 
Eq. (32) 



1« N 



Fig. 5.3. Hierarchy of three 
major probability distributions. 



4 Indeed, this is the most popular definition of this major mathematical constant - see, e.g., MA Eq. (1.2a) with n 
replaced with - \IW. 
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The key property of the Poisson (and hence of the Gaussian) distribution is that it has the same 
variance as given by Eq. (27): 

(n 2 ) = ((N-(N)) 2 ) = (N). (5.33) 

(This is not true for the general binomial distribution.) For our current purposes, this means that for the 
ideal classical gas, Eq. (27) is valid for any number of particles. 



5.3. Volume and temperature 

What are r.m.s. fluctuations of other thermodynamic variables - like V, T, etc.? Again, the 
answer depends on conditions. For example, if the volume V occupied by a gas is externally fixed (say, 
by rigid walls), it evidently does not fluctuate at all: SV = 0. On the other hand, the volume may 
fluctuate in the situation when average pressure is fixed - see, e.g., Fig. 1.5. A formal calculation of 
these fluctuations, using the approach applied in the last section, is hampered by the fact that it is 
physically impracticable to fix its conjugate variable, P, i.e. suppress its fluctuations. For example, force 
fit) exerted by an ideal classical gas on vessel's wall (whose measure the pressure is) is the result of 

individual, independent hits of the wall by particles (Fig. 4), with time scale r c ~ r B /(77m) 1/2 ~ 10" 16 s, so 
that its frequency spectrum extends to very high frequencies, virtually impossible to control. 



(?) 



Fig. 5.4. Force exerted by gas 
particles on container's wall, as a 
function of time (schematically). 



However, we can use the following trick, very typical for the theory of fluctuations. It is almost 
evident that r.m.s. fluctuations of volume are independent of the shape of the container. Let us consider 
the particular situation similar to that shown in Fig. 1.5, with the container of a cylindrical shape, with 
the base area A. 5 Then the coordinate of the piston is just q = VIA, while the average force exerted by the 
gas on the cylinder is ? = PA - see Fig. 5. Now if the piston is sufficiently massive, its free oscillation 

frequency a> near the equilibrium position is small enough to satisfy the following three conditions. 

First, besides balancing the average force (f), and thus sustaining average pressure (P) = (f)/A 
of the gas, the interaction between the heavy piston and light molecules of the gas is weak because of a 
relatively short duration of the wall hits (Fig. 4). Because of that, the full energy of the system may be 
presented as a sum of those of the gas and the piston, with a quadratic contribution to piston's potential 
energy from small deviations of equilibrium: 

U P =^q 2 , q= q -( q ) = L ) (5.34) 



5 As a reminder, in geometry the term "cylinder" does not necessarily means the "circular cylinder"; the shape of 
base A may be arbitrary; it just should not change with height. 
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where k is the effective spring constant arising from gas' compressibility. 

f = PA 




Fig. 5.5. Deriving Eq. (37). 



Second, at co — > 0, that spring constant may be calculated just as for constant variations of 
volume, with the gas remaining in quasi-equilibrium at all times: 



k = 



8 M 

dq 



d p 



8(V 



(5.35) 



This partial derivative 6 should be taken at whatever the given thermal conditions are, e.g., with S = const 
for adiabatic conditions (i.e., thermally insulated gas), or with T = const for isothermic conditions (gas 
in a good thermal contact with a heat bath), etc. With that constant denoted as X, Eqs. (34)-(35) give 



A z 



d(P 



dv 



d(P 



dv 



v 



(5.36) 



Jx 



Finally, making co sufficiently small (namely, fico « T) by a sufficiently large piston mass, we can 
apply, to the piston's fluctuations, the classical equipartition theorem: (Up) = 772, giving 









r) x - T 








v d ( p )j 








X 



(5.37) 



Fluctuations 
of volume 



Since this result is valid for any A and co, it is (more or less :-) clear that it should not depend on 
system's geometry and piston mass, provided that it is large in comparison with the effective mass of a 
single system component (say, a gas molecule) - the condition that is naturally fulfilled in most 
experiments. 7 For the particular case of an ideal classical gas of iV particles, with the equation of state 
CP) = NT/V, Eq. (37), with constant X= T, yields 



6 As already was discussed in Sec. 4. 1 in the context of the van der Waals equation, for mechanical stability of a 
gas (or liquid), derivative dP/dVhas to be negative, so that icis positive. 

7 One may meet statements that a similar formula, 



P 1 ) = T 



m 

d(V) 



J x 



is valid for pressure fluctuations. However, such statement does not take into account a different nature of 
pressure (Fig. 4), with its very broad frequency spectrum. 
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SV 
V 



N 



1/2 ' 



(5.38) 



in evident agreement with the general Eq. (12) for a system of N independent parts. 



Now let us proceed to fluctuations of temperature, for simplicity focusing on the case V = const. 
Let us again assume that the system we are considering is weakly coupled to a heat bath of temperature 
To, in the sense that the time r of temperature equilibration between the two is much larger than the 
internal temperature relaxation (thermalization) time. Then we may assume that T changes in the whole 
system virtually simultaneously, and consider it a function of time alone: 

T = (T) + T(t) . 



(5.39) 

Moreover, due to the (relatively) large r, we may use the stationary relation between small fluctuations 
of temperature and the internal energy of the system: 



T(t) 



E(t) 

a 



so that ST = 



SE_ 



(5.40) 



Fluctuations 
of temperature 



With those assumptions, Eq. (20) immediately yields the famous expression for the so-called 

thermodynamic fluctuations of temperature: 



(5.41) 




The most straightforward application of this result is to analysis of so-called bolometers - 
broadband detectors of electromagnetic radiation in microwave and infrared frequency bands. In such a 
detector (Fig. 6), the incoming radiation it focused on a small sensor (e.g., a small piece of a Ge crystal, 
or a superconductor thin film at temperature T ~ T c , etc.) that is well isolated thermally from the 
environment. As a result, the absorption of even small radiation power V leads to a noticeable change 

AT 1 of sensor's average temperature (T) and hence of its electric resistance R, which is probed up by low- 
noise external electronics. 8 



T - 
T 



T) + T(t) 
■ T 0 +AT 

rAAAn R (T) 



V 




Fig. 5.6. Conceptual scheme of a bolometer. 



to electronics 



If power does not change in time too fast, AT is a certain function of P, turning into 0 at P= 0. 

Hence, if AT is much lower than the environment temperature To, we may keep only the main, linear 
term in it Taylor expansion in P. 



8 Besides low internal electric noise, the sensor should have a sufficiently large temperature responsivity dRIdT, 
making the noise contribution by the pickup electronics insignificant - see below. 
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AT = (T)-T Q =^, (5.42) 

where coefficient f = d 7?dT is called the thermal conductance of the unavoidable thermal coupling 
between the sensor and the heat bath - see Fig. 6. The power may be detected if the electric signal from 
the sensor, which results from change AT, is not drowned in spontaneous fluctuations. In practical 
systems, these fluctuations are is contributed by several sources including electronic amplifiers, sensor, 
etc. However, in modern systems these "technical" contributions to noise are successfully suppressed, 
and the dominating noise source are the fundamental fluctuations of sensor temperature, described by 
Eq. (41). In this case the so-called noise-equivalent power ("NEP"), defined as the level of V that 

produces signal equal to r.m.s. value of noise, may be calculated by equating Eqs. (41) (with (I) « T 0 ) 
and (42): 

NEP = -^. (5.43) 

V 

This expression shows that in order to decrease NEP, i.e. improve the device sensitivity, both the 
environment temperature 7b and thermal conductance f should be reduced. In modern receivers of 
radiation, their typical values (in SI units) are of the order of 0.1 K and 10" 10 W/K, respectively. 

On the other hand, Eq. (43) implies that in order to increase bolometer sensitivity, i.e. reduce 
NEP, the CV of the sensor, and hence its mass, should be increased. This conclusion is valid only to a 
certain extent, because due to technical reasons (parameter drift and the so-called l/f noise of the sensor 
and external electronics), incoming power has to be modulated with as high frequency co as possible (in 
most cases, the cyclic frequency v = colln oi the modulation is between 10 to 1,000 Hz), so that the 
electrical signal may be picked up from the sensor at that frequency. As a result, C v may be increased 
only until the thermal constant of the sensor, 

t = ^, (5.44) 

becomes close to \/a>, because at cot » 1 the useful signal drops faster than noise. As a result, the 
lowest (i.e. the best) value of NEP, 

^^ = aT 0 f<\ a~l, (5.45) 

is reached at vr « 1 . (The exact values of the optimal product cor, and the numerical constant or ~ 1 in 
Eq. (45), depend on the exact law of power modulation in time, and the output signal processing 
procedure.) With the parameters cited above, this estimate yields (NEP) m ; n / v 172 ~ 3xl0" 17 W/Hz 1/2 - a 
very low power indeed. 

However, surprisingly enough, the power modulation allows bolometric (and other broadband) 
receivers to register radiation with power much lower than this NEP! Indeed, picking up the sensor 
signal at the modulation frequency co, we can use the following electronics stages to filter out all the 
noise besides its components within a very narrow band, of width A v « v, around the modulation 
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frequency (Fig. 7). This is the idea of a microwave radiometer, 9 currently used in all sensitive 
broadband receivers. 



input 
power modulation 
frequency 




noise density 



pick-up 
to output 



frequency Fig. 5.7. Basic idea of the Dicke radiometer. 



In order to analyze this opportunity, we need to develop theoretical tools for a quantitative 
description of the spectral distribution of fluctuations. Another motivation for that description is the 
need in analysis of variables dominated by fast (high-frequency) components, such as pressure - please 
have one more look at Fig. 4. Finally, during the analysis, we will run into the fundamental relation 
between fluctuations and dissipation, which is one of the main results of statistical physics as a whole. 



Correlation 
function 



5.4. Fluctuations as functions of time 

There are two mathematically-equivalent approaches to time-dependent functions of time, called 
time-domain and frequency-domain pictures, with their relative convenience depending on the particular 
problem to be solved. 

In the time domain, we cannot characterize a random fluctuation / (t) of a classical variable by 

its statistical average, because it equals zero - see Eq. (2). Of course, variance (3) does not vanish, but if 
fluctuations are stationary, it does not depend on time either. Because of that, let us consider the 
following average: 10 

' f(t) f(t')). (5.46) 

Generally, this is a function of two arguments. Moreover, in the systems that are stationary (whose 
macroscopic parameters and hence the variable expectation values do not change with time), averages 
like (46) may depend only on the difference, 

r = t'-t, (5.47) 

between the two observation times. In this case, average (46) is called the correlation function of 
variable/: 



K,(T) = (f(t)f(t + T) 



(5.48) 



9 It was pioneered in the 1950s by R. Dicke, so that the device is frequently called the Dicke radiometer. 

10 Clearly, this is a temporal analog of the spatial correlation function discussed in Sec. 4.2 - see Eq. (4.30). 
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This name 11 catches the idea of this notion very well: Kj(t) tells us about the average mutual relation 
between the fluctuations at two times separated by interval r. Let us list the basic properties of this 
function. 

First of all, Kf(r) has to be an even function of the time delay r. Indeed, we may write 

K f (-T) = (f(t) f(t ~ t)) = (fit - r)/(0) = (fit') fit' + r)) , (5.49) 

with t' = t - r. For stationary processes, this average cannot depend on the common shift t' of the two 
observation times, so that averages (48) and (49) have to be equal: 



K f i-T) = K f iz). 
Second, at r— > 0 the correlation function tends to the variance: 

K f iO) = (fit)fit)) = (f 



(5.50) 



(5.51) 

In the opposite limit, when ris much larger than some characteristic correlation time r c of the system, 12 
the correlation function tends to zero, because fluctuations separated by such large time interval are 
virtually independent iuncorr elated). As a result, the correlation function typically looks like one of the 
plots sketched in Fig. 8. Note that on a time scale much longer than r c , any physically-realistic 
correlation function may be well approximated with a delta- function of r. 13 



KAt) 




Fig. 5.8. Correlation function of 
fluctuations: two typical examples. 

In the reciprocal, frequency domain, process / (t) is presented as a Fourier integral, 

+00 

7(0= \Le- ia *dco, (5.52) 

-co 

with the reciprocal transform being 

+co 

fa>=— \f{t)e i(0t dt. (5.53) 
2n *_ 



11 Another term, the autocorrelation function, is sometimes used for average (48) to distinguish it from the mutual 
correlation function, (J[t)g(t + r)), of two stationary processes. 

12 Correlation time r c is the direct temporal analog of the correlation radius r c which was discussed in Sec. 4.2. 

13 For example, for a process which is a sum of independent very short pulses, e.g., the gas pressure force exerted 
on the container wall (Fig. 4), such approximation is legitimate on time scales longer than the single pulse 
duration, e.g., the time of particle's impact on the wall. 
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If the initial function / (t) is random (as it is in the case of fluctuations), with zero average, its Fourier 
transform f m is a random function (now of frequency) as well, also with a vanishing statistical average: 



fm) = (i i7(t)e iat dt\ = ± \(f(t))e^dt = 0 . 



(5.54) 



The simplest nonvanishing average may be formed similarly to Eq. (46), but with due respect to the 
complex-variable character of the Fourier images: 



* 

foaffo' 



\dt'\dt(f(t)f(t'))e 



,,\Uco't'-cot) 



(5.55) 



It turns out that for a stationary process, averages (46) and (55) are directly related. Indeed, since 
the integration over t' in Eq. (55) is in infinite limits, we may replace it with integration over r =t' — t 
(at fixed t), also in infinite limits. Replacing t' by t + t in expressions under the integral, we see that 
the average is just the correlation function K^f), while the time exponent is equal to exp{i(a>' - 
cd)t}exp{ico'T}. As a result, changing the order of integration, we get 



fX') = T^y \dt\dxK f ^^^ = ^ \K f (r)e l ^dr \e^" " dt . (5.56) 



i(a)-co')t 



But the last integral is just 2;rc^<y- <y')> 14 so that we finally get 



Spectral 
density of 
fluctuations 



fa f m ') = S f (a>)S(a>-a>% 



where the real function of frequency, 



+00 ^ CO 

S f ((o) = — \K f (T)e ian dr = — \K Jt) cos cot dr , 
2k j n{ 



(5.57) 



(5.58) 



Khinchin ^ S ca ^ e ^ me spectral density of fluctuations at frequency co. According to Eq. (58), the spectral density is 
theorem a Fourier image of the correlation function, and hence the reciprocal Fourier transform is: 15 ' 16 



Kf(r)= ^S f {co)e imT dco = 2^ S f {a>) cos cor dco . 



(5.59) 



In particular, for the variance, Eq. (59) yields 

+00 co 

(f 2 ) = K f (0) = J S f (cd)dco = 2 J S f {co)dco . 



(5.60) 



14 See, e.g., MA Eq. (14.4a). 

15 The second form of Eq. (59) uses the fact that, according to Eq. (58), Sf(cv) is an even function of frequency - 
just as Kf(f) is an even function of time. 

16 Despite the fact that Eqs. (58) and (59) look not much more than straightforward corollaries of the Fourier 
transform, they bear a special name of the Wiener-Khinchin theorem - after mathematicians N. Wiener and A. 
Khinchin who have proved that these relations are valid even for functions J{t) which are not square-integrable, so 
that from the point of view of rigorous mathematics, their Fourier transforms are not well defined. 
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This relation shows that term "spectral density" describes the physical sense of function Sj{(o) 
very well. Indeed, if a random signal f(f) had been passed through a frequency filter with a small 
bandwidth A v « v of positive cyclic frequencies, the integral in Eq. (60) had to be limited to interval 
Aco = 2nAv, i.e. that the variance of the output signal would become 17 

(f 2 ) Ay =2S f (a>)Aa) = 4nS f (a))Av . (5.61) 

To complete this introductory section, let me note an important particular case. If the spectral 
density of some process, is nearly constant within the frequency range of interest, Sj{co) = const = S/(0), 18 
Eq. (59) shows that its correlation function may be well approximated by a delta-function: 

+co 

K f (r) = S f (0)\e~ ia}T dco = 2nS f (0)S(r) . (5.62) 

-oo 

From this relation stems another popular name of the white noise, the delta-correlated process. We have 
already seen that this is a very reasonable approximation, for example, for the gas pressure force 
fluctuations (Fig. 4). Of course, for spectral density of a realistic, limited physical variable the 
approximation of constant spectral density cannot be true for all frequencies (otherwise, for example, 
integral (60) would diverge, giving an unphysical, infinite value of variance), and is valid only at 
frequencies much lower than 1/ r c . 



5.5. Fluctuations and dissipation 

Now we are mathematically equipped to address one of the most important topics of statistical 
physics, the relation between fluctuations and dissipation This relation is especially simple for the 
following hierarchical situation: a relatively "heavy", slowly moving system interacting with an 
environment consisting of rapidly moving, "light" components. A popular theoretical term for such a 
system is the Brownian particle, named after botanist R. Brown who first noticed in 1827 the random 
motion of pollen grains, caused by their random hits by fluid molecules, under a microscope. However, 
the family of such systems is much broader than that of mechanical particles. 19 

One more important assumption of this theory is that the system's motion does not violate the 
thermal equilibrium of the environment - well fulfilled in many cases. (Think, for example, about a 
usual mechanical pendulum whose motion does not overheat the air around it.) In this case, the 
statistical averaging over the thermally-equilibrium environment may be performed for any (slow) 
motion of the system of interest, considering the motion fixed. 20 I will denote such a "primary" 
averaging by angular brackets (...). At a later stage we may carry out another, "secondary" averaging, 



17 A popular alternative definition of the spectral density is Sj^v) = AtjS^co), making average (61) equal to 5J( v)Av. 

18 Such process is frequently called white noise, because it consists of all frequency components with equal 
amplitudes, reminding the white light, which consists of many monochromatic components. 

19 Just for one example, such description may be valid for the complex amplitude of an electromagnetic field 
mode weakly interacting with matter. To emphasize this generality, I will use letter q rather thanx for "particle's" 
coordinate. 

20 For a usual (ergodic) environment, the primary averaging may be interpreted as that over relatively short time 
intervals, t c « At « r, where t c is the correlation time of the environment, while r is the characteristic time 
scale of motion of our "heavy" system of interest. 
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over an ensemble of many similar systems of interest, coupled to similar environments. If we do, it will 
be denoted by double angle brackets «...)). 

Let me start from a simple classical system, a ID harmonic oscillator whose equation of 
evolution may be presented as 

mq + m = fact (0 + n„ v (0 = 4t (0 + (?) + f (0 , (5-63) 

where q is the (generalized) coordinate of the oscillator, fdet(0 is the deterministic (generalized) external 
force, while both components of the random force fit) present the impact of the environment on 
oscillator's motion. Again, from the point of view of the fast-moving environmental components, the 
oscillator's motion is slow. The average of the force exerted by environment on such a slowly moving 
object may have a part depending on not only q, but on the velocity q as well. For most systems, the 
Taylor expansion of the force in small velocity would have a finite leading, linear term, so that we may 
take 



Langevin 
equation 
for classical 
oscillator 



{f) = -7jq, (5.64) 

so that Eq. (63) may be rewritten as 

mq + Tjq + Kq = f dei (t) + f(t). (5.65) 



This way of describing the effects of environment on an otherwise Hamiltonian system is called 
the Langevin equation. 21 Due to the linearity of the differential equation (65), its general solution may 
be presented as a sum of two parts: the deterministic motion of the linear oscillator due to the external 
force fdetif), and random fluctuations due to the random force exerted by the environment. The former 
effects are well known from classical dynamics, 22 so let us focus on the latter part by taking fd e t(t) = 0. 
The remaining term in the right-hand part describes the fluctuating part of the environmental force; in 
contrast to the average component (64), its intensity (read: its spectral density at relevant frequencies co 
~ 0o = i/dm) ) does not vanish at qif) = 0, and hence may be evaluated ignoring system's motion. 

Plugging into Eq. (65) the presentation of both variables in the form similar to Eq. (52), for their 
Fourier images we get the following relation: 

-mo) 2 q m -icoriq co +/ i q to =f m . (5.66) 



which immediately gives us q„. 



^tr~. — • (5-67) 



itc-mco ) - ir/a 



21 After P. Langevin whose 1908 work was the first systematic development of A. Einstein's ideas on Brownian 
motion (see below) using this formalism. A detailed discussion of this approach, with numerical examples of its 
application, may be found, e.g., in the monograph by W. Coffey, Yu. Kalmykov, and J. Waldron, The Langevin 
Equation, World Scientific, 1996. 

22 See, e.g., CM Sec. 4.1. In this and the next sections I assume that variable/(0 is classical, with the discussion of 
the quantum case postponed until Sec. 6. 
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Now multiplying Eq. (67) by its complex conjugate, averaging both parts of the resulting equation, and 
using for each of them Eq. (57), 23 we get the following relation between spectral densities of the 
oscillations and force: 

= " * y S f (a>) . (5.68) 

(K-ma> ) +{r}co) 

As the reader should know well from classical dynamics, at small damping (rj « ma>o) the first 
factor in the right-hand part of Eq. (68) describes the resonance, i.e. has a sharp peak near oscillator's 
eigenfrequency a>o, and may be presented in that vicinity as 

7 h^~( — TTi — ZT\> at |^l <<(y o with ^ = co-(o 0 , S = rj/2m. (5.69) 

(ic-mco 2 ) 2 +{r/co) 4m/c{^ 2 +S 2 ) 

In contrast, spectral density Sf{a>) of fluctuations of a typical environment is changing slowly near that 
frequency, so that for the purpose of integration over frequencies near a>o we may replace Sf(co) with Sf 
(a>o)- As a result, the variance of the environment-imposed random oscillations may be calculated as 

CO +OC j ■> 

^ 2 )) = 2js»<to*2 j5 f (ffl)rffl,«25 f K)— J-5-^-5-. (5.70) 
o ««<w 0 4m/r _ m q + o 

The last expression includes a well-known table integral, 24 equal to nl8= 27un/r\, so that finally 

g 2 )) = 2S,K) 1 — = —S,(co 0 ). (5.71) 
4m/c rj KTj 

But on the other hand, the weak interaction with environment should keep the oscillator in 
thermodynamic equilibrium at the same temperature T. Since our analysis has been based on the 
classical Langevin equation (65), we may only use it in the classical limit ha>o « T, in which we may 
use the equipartition theorem (2.48). In our current notation, it yields 



Comparing Eqs. (71) and (72), we see that the spectral density of the random force exerted by 
environment is fundamentally related to the damping it provides: 

S f (a> Q ) = 9-T. (5.73a) 
n 

Now we may argue (rather convincingly :-) that since this relation does not depend on oscillator's 

1/2 

parameters m and k, and hence its eigenfrequency m = (ic/m) , it should be valid at any (but 
sufficiently low, cot c « 1) frequency. Using Eq. (58) with <y— > 0, it may be rewritten as the so-called 
Green-Kubo (or just "Kubo") formula for the effective low-frequency viscosity: 



23 At this stage we restrict our analysis to random, stationary processes q(t), so that Eq. (57) is valid for this 
variable as well, if the averaging is understood in the «. . .)> sense. 

24 See, e.g. MA Eq. (6.5a). 
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Green- 
Kubo 
formula 



* CO * 00 



(5.73b) 



Relation (73) reveals an intimate, fundamental connection between fluctuations and dissipation 
provided by a thermally-equilibrium environment. Verbally, "there is no dissipation without 
fluctuations" - and vice versa. 25 Historically, this fact was first recognized in 1905 by A. Einstein, 26 in 
the following form. Let us apply our result (73) to the particular case of a free ID Brownian particle, by 
taking k = 0. In this case both equations (71) and (72) give infinities. In order to understand the reason 
for that divergence, let us go back to the Langevin equation (65) with not only k = 0, but also, just for 
the sake of simplicity, m — > 0 as well. (The latter approximation, frequently called the overdamping 
limit, is quite appropriate for the motion of a small particle in a viscous fluid, when m « r/At even for 
smallest time intervals At between the successive observations of particle's positions.) In this 
approximation, Eq. (65) is reduced to a simple equation, 

Vq = ?*(?) + fit), (5-74) 
with a ready solution for particle displacement during a finite time interval t: 

Aq(t) = q(t)-q(0) = ((Aq(t))) + q(t), ((Aq(t)}) = -\ f Aet (t')dt', Aq(t) = - \f (t')df . (5.75) 



n 



Evidently, in the statistical average of the displacement, the fluctuation effects vanish, but this 
does not mean that the particle does not deviate from the deterministic trajectory ((q(t))) - just that is has 
equal probabilities to be shifted either of two possible directions from that trajectory. To see that, let us 
calculate the variance of the displacement: 

i t t t t 

Aq\t))) = —\dt'\dt"(f(t')f(t")) = —\dt'\dt"K f {t'-t") . (5.76) 



n 



As we already know, at times r» r c (this correlation time, for typical molecular impacts, is of the order 
of a picosecond), correlation function may be well approximated by the delta-function - see Eq. (62). In 
this approximation, with S/0) expressed by Eq. (73), and Eq. (80) yields 



t t r\ rri t t 

Aq l {t))) = -^-S r {0)\dt'\dt"S(t-f) = -^^—\dt'\dt"S(t-f) = 2Dt .. 

v oo n 7i 



V 



with 



Einstein's 
relation 




(5.77) 



(5.78) 



25 This means that the phenomenological description of dissipation by bare viscosity in classical mechanics (see, 
e.g., CM Sec. 4.1) is only valid approximately, when the energy scale of the process is much larger than T. 

26 It was published in one of the three papers of Einstein's celebrated 1905 "triad". As a reminder, another paper 
started the (special) relativity, and one more was the quantum description of photoelectric effect, essentially the 
prediction of light quanta - photons, which essentially started quantum mechanics. (Not too bad for one year!) 
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The final form of Eq. (77) describes the well-known law of diffusion ("random walk") of a ID 

1/2 

system, with the r.m.s. deviation from the point of origin growing as (2Dt) . Coefficient D is this 
relation is called the coefficient of diffusion, and Eq. (78) describes the extremely simple Einstein 
relation between that coefficient and particle's damping. Often this relation is rewritten in SI units of 
temperature as D = jU m kBTK, where ju m = \/rj is the mobility of the particle. The physical sense of ju m 
becomes clear from rewriting the expression for the deterministic viscous motion ((q(f))) (particle's 
"drift") in the form: 

d((q(t))) 1 
dt rj 

so that mobility is just velocity given to the particle by unit force. 27 

Another famous example of application of Eq. (73) is to the thermal (or "Johnson", or "Johnson- 
Nyquist", or just "Nyquist") noise in resistive electron devices. Let us consider a two-terminal "probe" 
circuit, playing the role of the harmonic oscillator in our analysis above, connected to a resistor R (Fig. 
9), playing the role of noisy environment. (The noise is generated by the thermal motion of numerous 
electrons, randomly moving inside the resistor.) For this system, one convenient choice of conjugate 
variables (the generalized coordinate and generalized force) is, respectively, the electric charge Q = 
\l(i)dt that has passed through the "probe" circuit by time t, and voltage ^ across its terminals, with the 
polarity shown in Fig. 9. (Indeed, product VdQ is indeed the elementary work d^ done by the 
environment on the probe circuit.) 




Fig. 5.9. Resistor R of temperature T as a noisy 
environment of a two-terminal probe circuit. 



Making the corresponding replacements, g — > Q and > V in Eq. (64), we see that it becomes 

tjQ = -tjI = (v). (5.80) 

Comparing this relation with Ohm's law, R(-l) = ^ 8 we see that in this case, coefficient rj has the 
physical sense of the usual Ohmic resistance R, 29 so that Eq. (73) becomes 

S r (a>) = —T. (5.81a) 

n 



27 In solid-state physics and electronics, mobility is more frequently defined as \\in&l^\= e|v c jrift/'fdet| (where £ is 

the applied electric field), and is traditionally measured in cm 2 /V-s. In these units, the electron mobility in silicon 
wafers used for integrated circuit fabrication (i.e. the solid most important for engineering practice) at room 
temperature is close to 10 3 . 

28 The minus sign is due to the fact that in our notation, current through the resistor equals (-7) - see Fig. 9. 

29 Due to this fact, Eq. (64) is often called the Ohmic model of the environment response, even if the physical 
nature of variables q and f is completely different from the electric charge and voltage. 



Chapter 5 



Page 18 of 40 



Essential Graduate Physics 



SM: Statistical Mechanics 



Using Eq. (61), and transferring to the SI units of temperature (77 
Nyquist formula 1 ' 0 to its most popular form 



Nyquist 
formula 



v- 



Av 



= 4k B T K RAv 



£b?k) , we can bring this famous 



(5.81b) 



Note that according to Eq. (65), this result is only valid at a negligible speed of change of the 
generalized coordinate q (in this case, negligible current I), i.e. Eq. (81) expresses the voltage 
fluctuations as would be measured by an ideal voltmeter, with an input resistance much higher that R. 

On the other hand, applying a different choice of generalized coordinate and force, q — > O, t 2 — » I 
(where O = \ f[i)dt is the generalized magnetic flux, so that = Id<3>), we get r/ — > MR, and Eq. (73) 
yields the thermal fluctuations of the current through the resistor (as measured by an ideal ammeter, i.e. 
at V -» 0): 



1 

nR 



i.e. 



Av 



R 



(5.81c) 



Schottky 
formula 



Note that Eqs. (81) as valid for noise in thermal equilibrium only. In electric circuits, which may 
be readily driven out of equilibrium by applied voltage ( V >, other types of noise are frequently 
important, notably the shot noise, which arises in short conductors, e.g., tunnel junctions, at applied 
voltages {V) » T Iq, due to the discreteness of charge carriers. 31 A straightforward analysis using a 
simple model, described in the assignment of Exercise Problem 5, shows that this noise may be 
characterized by current fluctuations with low-frequency spectral density 



(5.82) 



where q is the electric charge of a single current carrier. This is the Schottky formula, valid for any 
relation between I and V. Comparison of Eqs. (81c) and (82) for a device that obeys the Ohm law shows 
that the shot noise has the same intensity as the thermal noise with effective temperature 




77 = 



\qY\ 



»T. 



(5.83) 



This relation may be interpreted as a result of charge carrier overheating by the applied electric field, 
and explains why the Schottky formula (82) is only valid in conductors much shorter than the energy 



30 Named after H. Nyquist who derived this formula in 1928 (independently of the prior work by A. Einstein, M. 
Smoluchowski, and P. Langevin) to describe the noise which had been just discovered experimentally by his Bell 
Labs' colleague J. B. Johnson. The derivation of Eq. (73) and hence Eq. (81) in these notes is essentially a twist of 
the derivation used by Nyquist. 

31 Another practically important type of fluctuations in electronic devices is the low-frequency 1/f noise which 
was already mentioned in Sec. 3 above. I will briefly discuss it in Sec. 8. 
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relaxation length l e of the charge carriers. 32 Another mechanism of the shot noise suppression, that 
becomes noticeable if system's transparency is high, is the Fermi-Dirac statistics of electrons. 33 

Returning to the bolometric Dicke radiometer (see Figs. 6-7 and their discussion), we may now 
use the Langevin equation formalism to finalize its analysis. For this system, the Langevin equation is 
just the usual equation of heat balance: 



C v ^ + f(T-T 0 ) = -P iet (t) + ^(t) 
at 



(5.84) 



where T^et = (^) describes the (deterministic) power of absorbed radiation, and T 3 presents the effective 
source of temperature fluctuations. Now we can use Eq. (84) to carry out a calculation of the spectral 
density Sj{co) of temperature fluctuations absolutely similar to how this was done with Eq. (65), 
assuming that the frequency spectrum of the fluctuation source is much broader than the intrinsic 
bandwidth lit = $C V of the bolometer, so that its spectral density at frequencies cor - 1 may be well 
approximated by its low-frequency value SV(0): 



S T {co) = 



1 



ia>C v +1 



Mo). 



(5.85) 



Then, requiring the variance of temperature fluctuations, 



(ST) 2 = (f 2 ) = 2 J S T (co)dco = 2S r (0)J 



1 



- icoC v + f 



-« CO 

d<D=2Sr(0)- Y \ 



dco 



C 2 {cD 2 +(f/Cy) 2 ~ fC, 



(5.86) 



to coincide with our earlier "thermodynamic fluctuation" result (41), we get 



n 



(5.87) 



The r.m.s. value of the "power noise"/ 0 within bandwidth Av « lit (Fig. 7) becomes equal to the 
deterministic signal power /Pdet (or more exactly, the main harmonic of its modulation law) at 



,1/2 



(2S p (0)Acoj' 2 =2(fAv) V2 T 0 



(5.88) 



This result shows that our earlier prediction (45) may be improved by a substantial factor of the 

1/2 

order of {Aviv) , where the reduction of the output bandwidth is limited only by the signal 
accumulation time At ~ 1/Av, while the increase of v is limited by the speed of (typically, mechanical) 
devices performing the power modulation. In practical systems this factor may improve the sensitivity 
by a couple orders of magnitude, enabling observation of extremely weak radiation. Maybe the most 
spectacular example are the recent measurements of the CMB radiation (discussed in Sec. 2.6), which 
corresponds to blackbody temperature 7k ~ 2.725 K, with accuracy c57k ~ 10" 6 K, using microwave 



32 See, e.g., Y. Naveh et ah, Phys. Rev. B 58, 15371 (1998). In practically used metals, l e is of the order of 30 nm 
even at liquid helium temperatures (and even shorter at ambient conditions), so that the usual "macroscopic" 
resistors do not exhibit the shot noise. 

33 For a review of this effect see, e.g., Ya. Blanter and M. Biittiker, Phys. Repts. 336, 1 (2000). 
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receivers with physical temperature of all their components much higher than ST. The observed weak 
(~10~ 5 K) anisotropy of the CMB radiation is a major experimental basis of all modern cosmology. 

Let me also note that Eq. (73) may be readily generalized to the case when environment's 
response is different from the Ohmic model (64). This generalization is virtually evident from Eq. (66). 
Indeed, the second term in its left-hand part is just the Fourier component of the average response of the 
environment: 

{? co ) = ia>riq a . (5.89) 
Let the environment's response be still linear, but have an arbitrary dispersion, 

fc> = *(«»)*.. (5-90) 

where the function ^(»), called the generalized susceptibility of the environment, may be complex, i.e. 
have both the imaginary and real parts: 

X {a>) = X \(o) + ix"{co). (5.91) 
Then Eq. (73) remains valid 34 with the replacement rj — > _^ "(&>)/&> : 

S (a,) = ^W r> ( 5 92) 

T 7CCO 

Now let us discuss what generalization of Eq. (92) is necessary to make that fundamental result 
suitable for arbitrary temperatures, T ~ ha>. The calculations we had performed started from the 
apparently classical equation of motion, Eq. (63). However, quantum mechanics shows 35 that a similar 
equation is valid for the corresponding Heisenberg-picture operators, so that repeating all arguments 
leading to the Langevin equation (65), we may write its quantum-mechanical version 



Heisenberg- 
Langevin 
equation 



mq + ?]q + icq=f det +? 



(5.93) 



This is the so-called the Heisenberg-Langevin (or "quantum Langevin") equation - in this particular 
case, for a harmonic oscillator. 

The further operations, however, require certain caution, because the right-hand part of the 
equation is now an operator, and has some nontrivial properties. For example, the "values" of the 
Heisenberg operator, representing the same variable^) at different times, do not necessarily commute: 



fit), fit') 



*0, if f'*f. (5.94) 



As a result, the function defined by Eq. (46) may not be an even function of time delay z =t' — t even 
for a stationary process, making it inadequate for representation of the real correlation function - which 



34 Reviewing the calculations leading to Eq. (73), we may see that if the possible real part x\°^) °f the 
susceptibility just adds up to (k - mca 2 ) in the denominator of Eq. (67), resulting in a change of oscillator's 
eigenfrequency. This renormalization is insignificant if the oscillator-to-environment coupling is weak, i.e. 
susceptibility x(ai) small, as had been assumed at the derivation of Eq. (69) and hence Eq. (73). 

35 See, e.g., QM Sec. 4.6. 
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has to obey Eq. (51). This technical difficulty may be circumvented by the introduction of the following 

symmetrized correlation function 



1 

2 V 



K f (T) = - f(t) f(t + T) + f(t + T)fit) 



f(t),f(t + T) 



(5.95) 



(where {...,...} denotes the anticommutator of the two operators), and, similarly, the symmetrical 
spectral density Sjico), defined by relation 



S.WSio-at^hf^L+fLfa)* 



1 



f* 



(5.96) 



with Kfir) and S/ico) still related by the Fourier transform (59). 36 

Now we may repeat all the analysis that was carried out for the classical case, and get Eq. (71) 
again, but this expression has to be compared not with the equipartition theorem (72), but with its 
quantum-mechanical generalization (2.78), which, in our current notation, reads 



hoo a , hoo,, 
°-coth- 



q II 2k 2T 
As a result, we get the following quantum-mechanical generalization of Eq. (92): 




(5.97) 



(5.98) 



This is the much-celebrated fluctuation-dissipation theorem, frequently referred to just as FDT. 37 

As natural as it seems, this generalization poses a very interesting conceptual dilemma. Let, for 
the sake of clarity, temperature be relatively low, T « hoy, then Eq. (98) gives a temperature- 
independent result 

h X "(co) 



Fluctuation- 
dissipation 
theorem 



SAa>) 



2n 



(5.99) 



which is frequently called the quantum noise. According to the quantum Langevin equation (93), 
nothing but these fluctuations of the force exerted by the environment, with spectral density proportional 
to the imaginary part of susceptibility (i.e. damping), are the source of the ground-state "fluctuations" of 
the coordinate and momentum of a quantum harmonic oscillator, with r.m.s. values 



Sq = ((q 



1/2 



h 



.1/2 



2m co. 



Sp = ((p 



o j 



1/2 



= mco Q Sq = 



hmoo n 



.1/2 



i.e. Sq-Sp = -, (5.100) 



and average energy ho%!2. On the other hand, the basic quantum mechanics tells us that exactly these 
formulas describe the ground state of a dissipation-free oscillator, not coupled to any environment, and 
are a direct corollary of the Heisenberg uncertainty relation 



36 Please note that here (and to the end of this section) brackets (...) mean quantum-statistical averaging (2.12). 
As was discussed in Sec. 2.1, for a classical-mixture state of the environment, this does not create any difference 
in either mathematical treatment of the averages or their physical interpretation. 

37 It was first derived in 195 1 by H. Callen and T. Welton (in a somewhat different way). 
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Sq-dp>-. (5.101) 

(The Gaussian wavepackets, pertinent to a harmonic oscillator' ground state, turn the sign in Eq. (101) 
into pure equality.) So, what is the genuine source of Eqs. (100)? 

The resolution of this paradox is that either interpretation of Eqs. (100) is legitimate, with their 
relative convenience depending on the particular application. (One can say that since the right-hand part 
of the quantum Langevin equation (93) is a quantum-mechanical operator, rather than a classical force, 
it "carries the uncertainty relation within itself.) However, this opportunistic resolution leaves the 
following question open: is the quantum noise (99) of the environment observable directly, without any 
probe oscillator subjected to it? An experimental resolution of this dilemma is not quite simple, because 
usual scientific instruments have their own zero-point fluctuations, which may be readily confused with 
those of the system under study. Fortunately, this difficulty may be overcome, for example, using unique 
frequency-mixing ("down-conversion") properties of Josephson junctions. 38 Special low-temperature 
experiments using such down-conversion 39 have confirmed that noise (99) is real and measurable. This 
has been one of the most convincing direct demonstrations of the reality of the zero-point energy hco/2. 40 

Finally, let me mention briefly an alternative derivation 41 of the fluctuation- theorem from the 
general quantum mechanics of open systems. This derivation is substantially longer, but gives an 
interesting sub-product, 

where ^r) is the temporal Green's function of the environment (as "seen" by the system subjected to 
the generalized force f), defined by equation 

oo t 

(nt)) = \f(r)q(t-r)dr=\f(t-f)q(f)df . (5.103) 

0 -oo 

Plugging the Fourier transforms of all three functions participating in Eq. (103) into that relation, it is 
straightforward to check 42 that the Green's function is just the Fourier image of the complex 
susceptibility %^ai), defined by Eq. (90): 

oo 

\#{T)e ia > T dT = Z {a>); (5.104) 

o 

here 0 is used as a lower limit instead of (-oo) just to emphasize that due to the causality principle, the 
Green's function has to be equal zero for r < 0. 

In order to reveal the real beauty of Eq. (102), we may use the Wiener-Khinchin theorem (59) to 
rewrite the fluctuation-dissipation theorem (98) in a form similar to Eq. (102): 



38 K. Likharev and V. Semenov, JETP Lett. 15, 442 (1972). 

39 R. Koch et al, Phys. Rev. B 26, 74 (1982). 

40 Another one is the Casimir effect - see, e.g., QM Sec. 9.1. 

41 See, e.g., QM Sec. 7.4. 

42 See, e.g., CM Sec. 4.1, part (ii). 



= i 



(5.102) 
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^(t),ht + T)\j = 2K r (T), (5.105) 
where the correlation function K^r) is most simply described by its Fourier transform, equal to nS^co): 

]K f (T)coscoTdT= - v J coth — . (5.106) 
o 2 2T 

The comparison of Eqs. (102) and (104), on one hand, and Eqs (105)-(106), on the other hand, 
shows that both the commutation and anticommutation properties of the Heisenberg-Langevin force 
operator at different moments of time are determined by the same generalized susceptibility ^(«), but 
the average anticommutator also depends on temperature, while the average commutator does not. 43 



5.6. The Kramers problem and the Smoluchowski equation 

Returning to the classical case, it is evident that the Langevin equation (65) provides the means 
not only for the analysis of stationary fluctuations, but also for the description of an arbitrary time 
evolution of (classical) dynamic systems coupled to their environment - which, again, provides both 
dissipation and fluctuations. However, this approach suffers from two major handicaps. 

First, this equation does enable us to find the statistical average of variable q, and the variance of 
its fluctuations (i.e., in the common mathematical terminology, the first and second moments of the 
probability distribution) as functions of time, but not the distribution w(q, t) as such. This may not look 
like a big problem, because in most cases (in particular, in linear systems such as the harmonic 
oscillator) the distribution is Gaussian - see, e.g., Eq. (2.77). 

The second, more painful, drawback of the Langevin approach is that it is instrumental only for 
the already mentioned "linear" systems - i.e., the systems whose dynamics is described by linear 
differential equations, such as Eq. (65). However, as we know from classical dynamics, many important 
problems (for example, the Kepler problem of planetary motion 44 ) are reduced to ID motion in 
substantially anharmonic potentials U e f(q), leading to nonlinear equations of motion. If the energy of 
interaction between the system and its random environment is bilinear - i.e. is a product of variables 
belonging to these sub-systems (as it is very frequently the case), we may repeat all arguments of the 
last section to derive the following generalized version of the Langevin equation 

mq + riq + dU{q,t) =?(f), (5.107) 
dq 

valid for an arbitrary, possibly time-dependent potential U(q, t). 45 Unfortunately, the solution of this 
equation may be very hard. Indeed, the Fourier analysis carried out in the last section was essentially 
based on the linear superposition principle that is invalid for nonlinear equations. 



43 Only explicitly so, because the complex susceptibility of the environment may depend on temperature as well. 

44 See, e.g., CM Sec. 3.4-3.6. 

45 The generalization of Eq. (107) to higher spatial dimensionality is also straightforward, with the scalar variable 
q replaced by vector q, and the scalar derivative dUldq replaced with vector VU. 
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If the fluctuation intensity is low, \dq I « (q), where (q)(t) is the deterministic solution of Eq. 
(107) in the absence of fluctuations, this equation may be linearized 46 with respect to small fluctuations 
q =q-{q) to get a linear equation, 

mq +r/q +ic(t)q =?(t), with tc(t) = -^u{(q)(t),t). (5.108) 

dq 

This equation differs from Eq. (65) only by the time dependence of the effective spring constant ic(t), 
and may be solved by the Fourier expansion of both fluctuations and function K(t). Such calculations are 
somewhat more cumbersome than have been performed above, but may be doable (especially if the 
unperturbed motion (q){t) is periodic), and sometimes give useful analytical results. 47 

However, some important problems cannot be solved by the linearization. Perhaps, the most 
apparent example is the so-called Kramers problem 49, of finding the lifetime of a metastable state of a 
ID classical system in a potential well separated from the continuum motion region with a potential 
barrier (Fig. 10). 




In the absence of fluctuations, the system, placed close to well's bottom (q = q\), would stay 
there forever. Fluctuations result not only in a finite spread of the probability density w(q, t) around that 
point, but also in the gradual decrease of the total probability 

W(t)= \w{q,t)dq (5.109) 

well's 
bottom 

to find the system in the well, because of the growing probability of escape from the well, over the 
potential barrier, due to thermal activation. If the barrier height, 

U^U{q 2 )-U{q x ), (5.110) 

is much larger than temperature T, 49 the Boltzmann distribution w cc Qxp{-U(q)/T} should be 
approximately valid in most of the well, so that the probability for the system to overcome the barrier 
should scale as Qxp{-Uo/T}. From these handwaving arguments, one may reasonably expect that if 
probability W(t) that the system is still in the well by time t should obey the usual "decay law" 



46 See, e.g., CM Sees. 3.2, 4.2, and beyond. 

47 See, e.g., Chapters 5 and 6 in W. Coffey et ah, The Langevin Equation, World Scientific, 1996. 

48 After H. Kramers who, besides solving this important problem in 1940, has made significant contributions to 
many other areas of physics, including the famous Kramers-Kronig dispersion relations - see, e.g., EM Sec. 7.4. 

49 If Uq is comparable with T, system's behavior also depends substantially on the initial probability distribution, 
i.e., do not follow the universal law (1 1 1). 
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W 

W = , (5.111) 

T 

then lifetime rhas to obey the general Arrhenius law, r= ta exp{{7 0 /r}. However, that relation needs to 
be proved, and the pre-exponential coefficient x A (frequently called the attempt time) needs to be 
calculated. This cannot be done by the linearization of Eq. (107), because the linearization is equivalent 
to a quadratic approximation of the potential U(q), which evidently cannot describe the potential well 
and the potential barrier simultaneously - see Fig. 10. 

This and other essentially nonlinear problems may be addressed using an alternative approach to 
fluctuation analysis, dealing directly with the time evolution of the probability density w(q,t). Due to the 
shortage of time, I will review this approach a bit superficially, using mostly handwaving arguments, 
and refer the interested reader to special literature 50 for strict mathematical proofs. Let us start from the 
effect of diffusion of a free ID particle in the high damping limit, described by the Langevin equation 
(74), and assume that at all times the probability distribution stays Gaussian: 

w(q , t) = —L exp j_kz|oIl (5 .112) 

(24' 2 Sq(t) y { 28q\t)\ 

where qo is the initial position of the particle, and Sq(t) is the time-dependent distribution width, which 
grows in time in accordance with Eq. (77): 

Sq(t) = {2Dtj 12 . (5.113) 

It is straightforward to check, by substitution, that this solution satisfies the following simple partial 
differential equation, 51 

dw d 2 w 

— = D^-, (5.114) 
8t 8q 2 

with the delta-functional initial condition 

w(q,0) = S(q-q 0 ). (5.115) 

The simple and important equation of diffusion (114) also describes several other physical phenomena 
(e.g., the heat spread along a uniform ID system), and may be naturally generalized to the 3D motion: 



dw -> 
— = DV 2 w. 
8t 



Equation 

(5.116) of 3D 

diffusion 



Now let us compare this equation with the probability conservation law, 52 

dw 



. +V-j w =0, (5.117a) 

dt 



50 See, e.g., either R. Stratonovich, Topics in the Theory of Random Noise, vol. 1., Gordon and Breach, 1963, or 
Chapter 1 in the monograph by W. Coffey et al., cited above. 

51 By the way, the goal of the traditional coefficient 2 in Eq. (77) is exactly to have the fundamental Eq. (114) free 
of numerical coefficients. 

52 Both forms of Eq. (117) are similar to the mass conservation law in classical dynamics (see, e.g., CM Sec. 8.2), 
and the electric charge conservation law in electrodynamics (see, e.g., EM Sec. 4.1). 
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where vector j w has the physical sense of the probability current density. (The validity of this relation is 
evident from its integral form, 



dt 



j"w/V + |j H , -</ 2 r = 0, 



(5.117b) 



that results from integration of Eq. (117a) over an arbitrary time-independent volume V limited by 
surface S, and applying the divergence theorem 53 to the second term.) The continuity relation (117a) 
coincides with Eq. (116), with D given by Eq. (78), only if we take 



= -DVw = Vw. 



(5.118) 



The first form of this relation allows a simple interpretation: the probability flow is proportional to the 
spatial gradient of probability density (i.e., in application to many (AO similar and independent particles, 
just to the gradient of their concentration n = Nw), with the sign corresponding to the flow from the 
higher to lower concentration. This flow is the very essence of the effect of diffusion. 

The fundamental Eq. (117) has to be satisfied also for a force-driven particle at negligible 
diffusion (D — > 0); in this case 



J w 



WY, 



(5.119) 



where v is the deterministic velocity of the particle. In the high-damping limit we are considering right 
now, v is just the drift velocity: 

v = -fa t =--VU(r), (5.120) 
77 J] 

where fa t is the deterministic force described by potential energy U(r). Now, as we have descriptions of 
j w due to both drift and diffusion separately, we may rationally assume that in the general case when 
both effects are present, the corresponding components of the probability current just add up, so that 



j w =-M-VC/)-7Vw], 



and Eq. (1 17a) takes the form 



Smoluchowski 
equation 



7 ^ = V(wV£/)+7V 2 w. 



(5.121) 



(5.122) 



This is the Smoluchowski equation, 54 which is closely related to the Boltzmann equation in multi- 
particle kinetics - to be discussed in the next chapter. 

As a sanity check, let us see what does the Smoluchowski equation give in the stationary limit, 
dw/dt — > 0 (which evidently may be achieved only if the deterministic potential U is time-independent.) 



53 See, e.g., MA Eq. (12.2), 

54 Named after M. Smoluchowski who developed this formalism in 1906, apparently independently from the 
slightly earlier Einstein's work, and in much more detail. This equation has important applications in many fields 
of science, including such surprising topics as statistics of spikes in neural networks. 
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Then Eq. (1 17a) yields j w = const, where the constant describes the motion of the system as the whole. If 
such motion is absent, j w = 0, then according to Eq. (121), 



wVU + TVw = 0, i.e. 



Vw VU 



w 



(5.123) 



Since the left-hand part of the last form of the last relation is just V(lnw), Eq. (123) may be immediately 
integrated, giving 



lnw = -— + lnC, i.e. w(r) = Cexpj — 



(5.124) 



Multiplied by the number TV of similar, independent systems, with spatial density n(r) = Nw(r), this is 
just the Boltzmann distribution (3.26). 

Now, as a less trivial example of the Smoluchowski equation's applications, let us use it to solve 
the ID Kramers problem (Fig. 10) in the corresponding high-damping limit, m « tjta. It is 
straightforward to check that the ID version of Eq. (121), 



w 



Dq 



. dw 
Tq 



(5.125a) 



is equivalent to 



K = exp 

J] I 



U(q)] d 
T jdq 



w exp 



(5.125b) 



(where I w is the probability current at a certain location q, rather than its density), so that we can write 



1 T J tj dq 



wexp 



V 



(5.126) 



As was discussed above, the notion of metastable state's lifetime is well defined only for sufficiently 
low temperatures 



T« U n 



(5.127) 



when the lifetime is relatively long, t » ta, where ta has to be of the order of the time of the system 
relaxation inside the well. Since the first term of the continuity equation (1 17b) is of the order of Wlr, in 
this limit the term, and hence the gradient of I w , are negligibly small, so the probability current does not 
depend on q in the potential barrier region. Let us integrate both sides of Eq. (126) over that region, 
using that fact: 



I Jexpj— j^ = -- -expj— J 



where the integration limits q ' and q " (Fig. 10) are selected so that so that 

T«U(q')- U(q x ), U(q 2 )-U(q ") « U 0 . 



(5.128) 



(5.129) 
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(Evidently, such selection is only possible if condition (127) is satisfied.) In this limit the contribution to 
the right-hand part from point q" is negligible because the probability density behind the barrier is 
exponentially small. On the other hand, the probability at point q ' is close to its stationary, Boltzmann 
value (124), so that 



M^expM^^expj^ 



(5.130) 



and Eq. (128) yields 



I K =-w(q x )l \ exp- 
*7 „' 



U(q)-U(qS 



dq 



(5.131) 



We are almost done. The probability density w{q\) at the well's bottom may be expressed in 
terms of the total probability W of the particle being in the well by using the normalization condition 



(5.132) 



well's 
bottom 



the integration here may be limited by the region where the difference U(q) - U(q\) is larger then Tbut 
still much smaller than Uq - cf. Eq. (129). According to the Taylor expansion, the shape of any smooth 
potential well near its bottom may be well approximated by a quadratic parabola: 



U(q* qi )-U( qi ) 



(q-qi) , where k x = 



2 w - ' 1 dq 2 

With this approximation, Eq. (132) is reduced to the standard Gaussian integral: 55 



(5.133) 



W = w{q x ) j" expj- 



K x (q-q,) 2 



\dq ~ w(q, ) | exp I \dq = w(q, ) 



well's 
bottom 



2T J IT 

J -oo ^ J 

To complete the calculation, we may use the similar approximation, 

U(q*q 2 )-U( gi ) 

z 

d 2 U 



N 1/2 



v K \ J 



(5.134) 



U(q 2 )-^-(q-q 2 Y 



-U( qi ) = U 0 -^-(q-q 2 ) 2 , 



(5.135) 



where k , = — 



dq z 



q=q 2 > 0, 



to work out the remaining integral in Eq. (131), because in the limit (129) this integral is dominated by 
the contribution from a region very close to the barrier top, where approximation (135) is asymptotically 
exact. As a result, we get 



f 2nT^ 



1/2 



V K 2 J 



(5.136) 



55 If necessary, see MA Eq. (6.9b) again. 
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Plugging Eqs. (136), and w(qi) expressed from Eq. (134), into Eq. (131), we finally get 

11/2 r us 



2^ 



(5.137) 



This expression should be compared with the ID version of Eq. (117b) for the segment [-oo, q']. 
Since this interval covers the region near q\ where most of the probability density resides, and 7 ? (-o°) = 
0, the result is merely 



dW 
dt 



+W) = o 



(5.138) 



In our approximation, I w (q ') does not depend on the exact position of point q ', and is given by Eq. (137), 
so that plugging it into Eq. (138), we recover the exponential decay law (111), with lifetime 



t = 



U 



\ K \ K i) 



exp^ -^ r > = 2n(r ] t 2 ) 1/2 exp 



[T 



XT 



where v 12 = 



1,2 



Kramers 

(5.139) £S 

damping 



Thus the metastable state lifetime is indeed described by the Arrhenius law, with the attempt 
time scaling as the geometric mean of system's "relaxation times" near the potential well bottom (n) 
and the potential barrier top (T2). 56 Let me leave for reader's exercise to prove that if the potential profile 
near well's bottom and/or top is sharp, the pre-exponential factor in Eq. (139) should be modified, but 
the Arrhenius exponent is not affected. 



5.7. The Fokker-Planck equation 

Expression (139) is just a particular, high-damping limit of a more general result obtained by 
Kramers. In order to recover all of it, we need to generalize the Smoluchowski equation to arbitrary 
values of damping rj. In this case, the probability density w is a function of not only the particle's 
position q (and time t), but also its momentum p - see Eq. (2.11). Thus the continuity equation (117a) 
needs to be generalized to 6D phase space. Such generalization is natural: 

f^ + V ? -j 9 +V p -j p =0, (5.140) 

where j g (which was called j w in the last section) is the probability current density in the coordinate 
space, while j p is the current density in the momentum space, and V p is the gradient operator in that 
space, 

(5141) 

while V q is the usual gradient operator in the coordinate space, that was denoted as V in the previous 
section - with index q added here just for additional clarity. At negligible fluctuations (T— > 0), j p in the 
momentum space may be evaluated using the natural analogy with j ? - see Eq. (119). In our new 
notation, that relation takes the following form, 



56 Actually, T2 describes the characteristic time of the exponential growth of small deviations from the unstable 
fixed point q 2 at the barrier top, rather than their decay, as near point q\. 
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j = w\ = wq = w- 



m 



(5.142) 



so it is naturally to take 



j = >vp = w /f) = w (-v U - rjy) = w(-VU - tj . 



(5.143) 



m 



As a sanity check, it is straightforward to verify that the diffusion-free equation resulting from 
the combination of Eqs. (140), (142) and (143), 



dw< 



= -V 

drift v q 



w— 



+ V, 



VU + ij- 



m J 



(5.144) 



allows the following particular solution 

w(q,p,t) = S(q - (q)(t))s(p - (p)(f)), (5.145) 
where the statistical-average coordinate and momentum satisfy the deterministic equations of motion, 



-V q U-tj 



(5.146) 



M 

m m 

describing particle's drift, with the appropriate deterministic initial conditions. 

In order to understand how the diffusion may be accounted for, let us consider a statistical 
ensemble of free (V q U= 0, rj — > 0) particles that are uniformly distributed in direct space (so that V q w = 
0), but possibly localized in the momentum space. For this case, the right-hand part of Eq. (144) 
vanishes, i.e. the time evolution of the probability density w may be only due to diffusion. In the 
corresponding limit (f) — > 0, the Langevin equation (107) for each Cartesian coordinate is reduced to 



mq,=?At) t i.e. p,=f,(t). 



(5.147) 



This equation is similar to the high-damping ID equation (74) (with f det = 0), with replacement q — > 

Pj/rj, and hence the corresponding contribution to dw/dt may be described by the second term of Eq. 
(122) with that replacement: 

8w i i T o o i 

-j] 2 V 2 p w = r/TV 2 p w. (5.148) 



— Lj«5 ■ = DV , w ■ 

^ , diffusion p / n 

dt V 



Now the reasonable assumption that in the arbitrary case the drift and diffusion contributions to dw/dt 
just add up, immediately leads us to the full Fokker-Planck equation: 51 



Fokker- 
Planck 
equation 



dw f 
— = -V • w— 

dt v m j 




" ( 

w V.I/ + ?7— 
V mj 


+ r/TV 2 p w. 



(5.149) 



As a sanity check, let us use this equation to find the stationary probability distribution of 
momentum of free particles, at arbitrary damping rj, in the momentum space, assuming their uniform 
distribution in the direct space, V q = 0. In the stationary case dw/dt = 0, so that Eq. (149) is reduced to 



It was derived in 1913 in A. Fokker's PhD thesis work; M. Planck was his thesis adviser. 
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w 


J]— 











+ ?]TV 2 w = 0. 



The damping coefficient rj cancels, and the first integration over momentum yields 

P 



m 



-w + 7V w = j. 



(5.150) 



(5.152) 



where j is a vector constant describing a possible motion of the system as the whole. In the absence of 
such motion, j = 0, the second integration over momentum gives 



w = const x exp<^ 



2mT\ 



(5.153) 



i.e. the Maxwell distribution (3.5). However, result (153) is more general than that obtained in Sec. 3.1, 
because it shows that the distribution stays the same even at nonvanishing damping. 

It is also easy to show that if the damping is large (in the sense assumed in the last section), the 
solution of the Fokker-Plank equation tends to the following product 



w(q,p, t) — > const x exp< 



2mT 



■x*v(q,0: 



(5.154) 



where the direct-space distribution w(q,0 obeys the Smoluchowski equation (122). However, in the 
general case, solutions of Eq. (149) may be rather complex, 58 so I would mention (rather than derive) 
only one of them, that of the Kramers problem (Fig. 10). Acting virtually exactly as in Sec. 6, one can 
show at arbitrary damping (but still in the limit (127), T« Uq, with the additional restriction r » mly), 
the metastable state's lifetime is again given by the Arrhenius formula (139), with the same exponent 
exp{C/o/7}, but with the reciprocal time constants 1/ t\ t 2 replaced with 



CO 



1.2 



col 2 + 



2m 



1/2 




1,2 ' 



for rj « mco X2 , 
for mco x 2 « rj, 



(5.155) 



1/2 

where Oh,i = {K\,ilm) , while K\z are the effective spring constants defined by Eqs. (133) and (135). 
Thus, in the most important particular limit of low damping, Eq. (139) is replaced with the famous 
formula 




Kramers 
,r i rr\ formula 
p.l^o; for low 

damping 



This Kramers' result for the classical thermal activation of the virtually -Hamiltonian system over 
the potential barrier may be compared with that for its quantum-mechanical tunneling through the 
barrier. 59 Even the simplest, WKB approximation for the latter time, 



58 The reader should remember that these solutions embody, as the particular case T= 0, all classical dynamics of 
a particle. 

59 See, e.g., QM Sees. 2.3-2.4. 
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t q = t a exp< 



2 ^x(q)dq 

K 2 (q)>0 



2m 



(5.157) 



shows that generally those two lifetimes have different dependences on the barrier shape. For example, 
for a nearly-rectangular potential barrier, the exponent that determines the classical lifetime (156) 
depends (linearly) only on the barrier height Uo, while that defining the quantum lifetime is proportional 
to the barrier width, while scaling as a square root of Uo. However, in the important case of "soft" 
potential profiles, which are typical for the case of barely emerging (or nearly disappearing) quantum 
wells (Fig. 1 1) the classical and quantum results may be simply related. 



U n 



U{q), 










> 




o q 2 \ \ 





Fig. 1 1 . Cubic-parabolic potential 
profile and its parameters. 



Indeed, such potential profile U{q) may be well approximated by 4 leading terms of its Taylor 
expansion, with the highest term proportional to (q- qo) , near some point q 0 in the vicinity of the well. 

2 2 

In this approximation, the second derivative d Uldq vanishes at the point q 0 = (q\ + qi)l2, exactly 
between the well's bottom and the barrier's top (in Fig. \ \,q\ and qi). Selecting the origin at this point, 
we may reduce the approximation to just two terms: 60 



b 3 

U(q) = aq--q , 



(5.158) 



with ab > 0. Using a straightforward calculus, we can find all important parameters of this cubic- 
parabola: the positions of its minimum and maximum: 



q 2 =~9i = {albf 2 , 



the barrier height over the well's bottom: 



( _3 A 



1/2 



U 0 =U(q 2 )-U( qi ) = 



(5.159) 



(5.160) 



and the effective spring constants: 



K l - K 2 - 



d 2 U 



dq' 



= 2(ab) u 



1/2 



(5.161) 



<?1,2 



The last expression shows that for this potential profile, frequencies co\,2 participating in Eq. 
(161) are equal to each other, so that this result may be rewritten as 



60 As a reminder, an absolutely similar approximation is used in Exercise Problem 4.3 for the P(V) function, in 
order to analyze properties of the van der Waals model near the critical temperature. 
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2n 

t = — exp< 




■ , 2 2(ab) 1 ' 2 
>, with col = — — • 


co 0 


T J 


m 



(5.162) 



Thermal 



On the other hand, for the same profile, the WKB approximation (157) (which is accurate when the j^ im ' e g ntum 
height of the metastable state energy over the well's bottom, E - U(qi) « hcooll, is much less than the in soft 
barrier height Uq) yields 61 



potential 
well 



2n 

0) n 



flCOr, 



J/2 



S64nU 



exp 



o J 



36 

5 hco n 



(5.163) 



Comparison of the dominating, exponential factors in these two results shows that the thermal 
activation yields lower lifetime (i.e., dominates the metastable state decay) if temperature is above the 
crossover value 



36 



nco Q = 7.2 hco Q . 



(5.164) 



This expression for the cw/jzoparabolic barrier may be compared with the similar crossover for a 
quadratic-parabolic barrier, 62 for which T c = Inhcoo « 6.28 hcoo. We see that the numerical factors for 
these two different soft potential profiles are very substantial, but rather close. 



5.8. Back to the correlation function 

Unfortunately I will not have time to review solutions of other problems using the 
Smoluchowski and Fokker-Planck equations, but have to mention one conceptual issue. Since it is 
intuitively clear that these equations provide the complete statistical information about the system under 
analysis, one may wonder whether they may be used to find the temporal characteristics of the system, 
which were discussed in Sees. 4-5 using the Langevin formalism. For any statistical average of a 
function taken at the same time instant, the answer is evidently yes - cf. Eq. (2.1 1): 

{fk(t),p(t))) = \m,vMq,p,t)d 3 qd 3 p, (5.165) 

but what if the function depends on variables taken at different times, for example the components of the 
correlation function Kj{ r) defined by Eq. (49)? 

To answer this question, let us start from the discrete variable case when Eq. (165) takes form 
(2.7), which, for our current purposes, may be rewritten as 

{fit)) = Y,f m W m {t). (5.166) 

m 

In plain English, this is a sum of all possible values of the function, each multiplied by its probability as 
a function of time. But this means that average (f{f)f(t')) may be calculated as the sum of all possible 
products fnf m ; multiplied by the joint probability for measurement outcome m at moment t, and outcome 



61 The main, exponential factor in this result may be obtained simply by ignoring the difference between E and 
U(q\), but the correct calculation of the pre-exponent requires to take this difference, hcoo/2, into account - see K. 
Likharev, Physica B 108, 1079 (1981). 

62 See, e.g., QM Sec. 2.4. 
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m' at moment t'. The joint probability may be presented as a product of W m (f) by the conditional 
probability W(m ', t'\ m, t). Since the correlation function is well defined only for stationary systems, in 
the last expression we can take t = 0, i.e. find the conditional probability as the result, W,„{t), of solution 
of the equation describing system's probability evolution, at time r = t' - t (rather than t'), with the 
special initial condition 



(5.167) 



On the other hand, since the average (J{f)j{t +r)> of a stationary process should not depend on t, instead 
of W m (t) we may take the stationary probability distribution W m {co), independent of the initial 
conditions, and may be found as the same special solution, but at time r— > qo. As a result, we may write 



Correlation 
function of 
discrete- 
state 
system 



{f{t)f(t + r)> = Y^f m W m (™)f m W m {r). 



(5.168) 



This expression looks simple, but note that this recipe requires to solve equations for each 
W m {r) for all possible initial conditions (167). To see how this recipe works in practice, let us revisit the 
simplest two-level system (see, e.g., Fig. 4.13 reproduced in Fig. 12 below in a notation more 
convenient for our current purposes), and calculate the correlation function of its energy fluctuations. 



W x (t) 



E, = A 




Fig. 5.12. Dynamics of a two-level system. 



The stationary probabilities for this system (i.e. the probabilities for r— > qo) have been calculated 
in Chapter 2, and then again in Sec. 4.4. In our current notation (Fig. 12), 



^ 0 (00): 



1 



l + e 



-AIT 



AIT , ■ 
e + 1 



-IT . 
e + 1 



(5.169) 



In order to calculate the conditional probabilities W m {r) with initial conditions (172) (according to Eq. 
(168), we need all 4 of them, for m, m' = 0, 1), we may use master equations (4.100), in our current 
notation reading 



dW x 
dr 



dW, 



dr 



(5.170) 



Since Eq. (170) conserves the total probability, Wo + W\ = 1, only one probability (say, W\) is an 
independent variable, and for it, Eq. (170) gives a simple, linear differential equation, 



dW x 
dr 



= T t - T^W^, where r s s^+T^ 



This equation may be readily integrated for an arbitrary initial condition: 



W l (r) = W 1 (0)e r ^ +^(oo)(l 



(5.171) 



(5.172) 
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where W\(pc) is given by the second of Eqs. (169). (It is straightforward to check that the solution for 
Wo(t) may be presented in the similar form, with the corresponding change of the state index.) Now 
everything is ready to calculate average (E(f)E(t +r)> using Eq. (168), with f m , m , = £0,1 • Thanks to our 
(smart :-) choice of energy origin, of 4 terms in the double sum (168), all 3 terms that include at least 
one factor E 0 = 0 vanish, and we have only one term left: 



E(f)E(t + vj) = E X W X (00)^ (r) 



w=1 =^ 1 (oo)fT 1 (0> T * T +W l (a3)\l-e 



-r E r 



A 1 



e A/T +\ 



+ 



e A/T +\ 



l-e 



-T-T 



A 1 



[e MT + \ V 



l + e A/T e- T - T 



(5.173) 



From here and the last of Eqs. (169), the correlation function of energy fluctuations is 63 

K E (t) = (E(t)E(t + r)) = {(E(t) - (E(t))\E(t + t) - (E(t)))) = (E(t)E(t + r)) - (E(t))(E(t + r)) 

e A/T -I> ( 5 - 174 ) 



= (E(t)E(t + r)}-(E) = A 2 



[e AIT + l) 2 



Since transition rates Tt and Ti have to obey the detailed balance relation (4.103), rj/Tt = exp{A/r}, 
and hence 



A/T 



( e ^ + i) 2 "(r,/r t+ ir(r t+ r,r r 2 ' 

expression (174) may be presented also in a simpler form: 



(5.175) 



K E (r) = A 



2 M-I; 



(5.176) 



Energy 



We see that the correlation function of energy decays exponentially with time, with the net rate fluctuatl0ns 

bJ 3 v J ' in two-level 

I\, while its variance, equal to K E (0), does not depend on the transition rates. Now using the Wiener- system 
Khinchin theorem (58) to calculate its spectral density, we get 



^ 00 

S E (co) = -\A 



2 r t r i -Ft r A" 

e COSCOTClT = 



r t r, 



r; 



(5.177) 



Such dependence on frequency 64 is very typical for discrete-state systems described by master 
equations. It is interesting that the most widely accepted explanation of the \lf noise (also called the 
"flicker" or "excess" noise), which was mentioned in Sec. 5, is that it is a result of thermally-activated 
jumps between metastable states of a statistical ensemble of such two-level systems, with an 
exponentially-broad statistical distribution of transition rates rt,4,. Such a broad distribution follows 



63 At the transition from the first line to the second one I am using the fact that the system is stationary, so that 
(E(t + r)> = (E(t)) = (E) = const. 

64 Regardless of the physical sense of such function of 00, and of whether its maximum is situated at either zero as 
in Eq. (177), or at a finite frequency coq as in Eq. (68), it is often referred to as the Lorentzian (or "Breit-Wigner") 
line. 
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from the Kramers formula (156), which is approximately valid for lifetimes of states of systems with 
double- well potential profiles (Fig. 13), for a statistical ensemble with a smooth statistical distribution of 
energy gaps A. Such profiles are typical, in particular, for electrons in disordered (amorphous) solid- 
state materials that, indeed, feature high \/f noise. 





U' 














0 


q 



Fig. 5.13. Typical double- 
well potential profile. 



Returning to the Fokker-Planck equation, we may use the evident generalization of Eq. (168) to 
Correlation tne continuous-variable case: 

function of 
continuous- 
state 
system 



f{t)f{t + r)) = \d 3 qd 3 p\d 3 q'd 2 p' /(q,p)w(q,p,cx))/(q',p')w(q',p',r), 



(5.178) 



were both probability density distributions are solutions of the equation with the delta-functional initial 
condition 



W (q',p',0) = J(q'-q)J(p'-p). 



(5.179) 



For the Smoluchowski equation, valid in the high-damping limit, the expressions are similar, albeit with 
a lower dimensionality: 



{f(t)f(t + T)) = \d 3 q\d 3 q' /(qMq,a>)/(q>(q',r), 
M/(q',0) = £(q'-q). 



(5.180) 
(5.181) 



To see this formalism in action, let us use it to find the correlation function K q (r) of a linear 
relaxator, i.e. an overdamped ID harmonic oscillator with ma>o « rj. In this limit, the coordinate 
averaged over the heat baths obeys a linear equation, 



Tj(q) + K{q) = 0 



(5.182) 

which describes its exponential relaxation from a certain initial condition qo to the equilibrium position 
q = 0, with the reciprocal time constant T = /c/rj: 

-Tt 



q)(t) = q 0 e-". (5.183) 

The deterministic equation (182) corresponds to the quadratic potential energy U(q) = tcq 12, so 
that the ID version of the Smoluchowski equation (122) takes the following form: 



ot dq 



w . 



(5.184) 



It is straightforward to check, by substitution, that this equation, rewritten for function w(q',r), with the 
delta- functional initial condition (181), w(q ',0) = <%q ' - q), is satisfied by a Gaussian function, 
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w(q',r) = 



(27r) 1 ' 2 Sq(r) 



exp 



(q'-{q)(r)) 2 
2Sq 2 {r) 



(5.185) 



with its center, (q)(r), moving in accordance with Eq. (183), and the time-dependent variance 

2\ T 



Sq 2 (T) = Sq 2 (oo)^-e~ Fr } where 5q 2 {oz) = (q : 



K 



(5.186) 



(As a sanity check, the last equality coincides with the equipartition theorem's result.) Finally, the first 
probability under the integral in Eq. (180) may be found from Eq. (185) in the limit r — > co (in which 
(q){f) — > 0), by replacing q ' for q: 



w(q,cc) 



1 



-exp 



q 



(2^) 1/2 ^(oo) r [ 2Sq 2 (ao)\ 
Now, all components of recipe (180) are ready, and we can write it, for / (q) = q, as 

1 



(5.187) 



{q(t)q(t + r)) = 



27rSq(r)Sq(cc) 



^dq ^dq' gexpj 



-co -co 



q 

2Sq 2 (oo) 



> q exp 



2Sq 2 (r) 



(5.188) 



The integral over q ' may be worked our first, by the replacing that integration variable with (q " + qe' Tr ) 
and hence dq ' with dq ": 



q(t)q(t + r) 



1 



27rSq(z)Sq(cc) J 



^exp<^ 



q 



2Sq 2 (o3) 



\dq\(q" 



+ qe rr )expi 



q 



1,2 



2Sq 2 (r) 



\dq" . (5.189) 



The integral of the first term in parentheses (q" + qe' TT ) equals zero (as that of an odd function in 
symmetric integration limits), while that with the second term is the standard Gaussian integral, giving 



(q(t)q(t + z)) 



1 



(27r) 1 ' 2 Sq(<x>) 



JVexpj- q 



2Sq z (oo) 



IT + °° ( i 

dqe~ TT =^re- Yz f# 2 exp{-^ 2 }^. (5.190) 
n k J 



1/2 

The last integral 65 is just n 12, so that taking into account that for this stationary system 
centered at the coordinate origin, the ensemble average (q) = 0, 66 we finally get a very simple result, 

Correlation 
, i m \ function of a 
(5.191) linear 



K q (r) = (q(t)q(t + r)) = {q(t)q(t + r)) - (q) = (q(t)q(t + r)) = -e 



K 



relaxator 



As a sanity check, for r = 0 it yields K q (0) = (q 2 ) = TIk, in accordance with Eq. (186). As ris increased 
the correlation function decreases monotonically - see the solid-line sketch in Fig. 8. 

So, the solution of this very simple problem has required straightforward but somewhat bulky 
calculations. On the other hand, the same result may be obtained literally in one line, using the Langevin 



65 See, e.g.,MAEq. (6.9c). 

66 This fact is not in any contradiction with the nonvanishing result (183) which is only valid for a sub-ensemble 
with a certain (deterministic) initial condition go- 
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formalism - namely, as the Fourier transform (59) of the spectral density (68) in the corresponding limit 
ma>« rj, with S^co) given by Eq. (73): 67 

CO CO rp -t r . T . i CO rj-, 

K a (t) = 2 f S a (co) cos cor dco = 2 f ^ — — cos cordco = 2 — f — C ° Sg - J£ = - e~ rr . (5. 1 92) 

This example gives is a good illustration of the fact that for linear systems (and small 
fluctuations in nonlinear systems) the Langevin approach is usually much simpler that the one based on 
the Fokker-Planck or Smoluchowski equations. However, again, the latter approach is indispensable for 
the analysis of fluctuations of arbitrary intensity in nonlinear systems. 

To conclude this chapter, I have to emphasize again that the Fokker-Plank and Smoluchowski 
equations give a quantitative description of time evolution of nonlinear Brownian systems with finite 
dissipation in the classical limit. The description of quantum properties of such dissipative ("open") and 
nonlinear quantum systems is more complex, 68 and only a few simple problems of such theory have 
been solved so far, 69 typically using a particular model of the environment, e.g., as a large set of 
harmonic oscillators with different statistical distributions of their parameters, leading to different 
frequency dependence of susceptibility y^cd). 



5.10. Exercise problems 

5.1 . Considering the first 30 digits of number ;r= 3.1415... as a statistical ensemble of integers k 
(equal to 3, 1, 4, 1, 5,...), calculate (i) average (k) , and (ii) the r.m.s. fluctuation Si. Compare the results 

with those for an ensemble of completely random integers 0, 1, .,9, and comment. 
Hint: You may find MA Eqs. (2.5) and (2.6a) useful. 



5.2 . For a field- free, two-site Ising system with energy values E m = -Js\S2, in the thermal 
equilibrium at temperature T, find the variance of energy fluctuations. Explore the low-temperature and 
high-temperature limits of the result. 



5.3 . Within the framework of the Weiss' molecular-field theory, calculate the variance of spin 
fluctuations in the J-dimensional Ising model. Use the result to derive the conditions of quantitative 
validity of the theory. 



5.4 . Starting from the Maxwell distribution of velocities, calculate constant C in the 
(approximate) expression K P (f) = Cc%f), for the correlation function of fluctuations of pressure P(f) of 



67 The involved table integral may be found, e.g., in MA Eq. (6.1 1). 

68 See, e.g., QM Sec. 7.6. 

69 See, e.g., the solutions of the ID Kramers problem for quantum systems with low damping by A. Caldeira and 
A. Leggett, Phys. Rev. Lett. 46, 211 (1981), and with high damping by A. Larkin and Yu. Ovchinnikov, JETP 
Lett. 37, 382(1983). 
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an ideal gas of N classical particles. Compare the result with that of Problem 3.2, and estimate the 
pressure fluctuation variance. 



Hint: You may like to consider a cylindrically-shaped container of 
volume V = LA (see Fig. on the right) to calculate fluctuations of force 
acting on its plane lid of area A, and then recalculate them into fluctuations 
of pressure P. 



NJ 



5.5 . Calculate the low-frequency spectral density of current I{t) due to random 
passage of charged particles between two conducting electrodes (see Fig. on the right). 
Assume that the particles are emitted by one of the electrodes at random times, and are fully 
absorbed by the counterpart electrode. 



5.6 . Calculate the correlation function of the coordinate of a ID harmonic oscillator with small 
Ohmic damping at thermal equilibrium. 



5.7 . Consider a very long, uniform, two-wire transmission line (see Fig. on the right), that allows 
the propagation of TEM waves with negligible attenuation, in thermal equilibrium with the environment 
at temperature T. Find variance {V >AvOf electromagnetic fluctuations of voltage V between the wires 
within a small frequency interval A v. 

E&M reminder. Electromagnetic waves in such a line propagate with a 
frequency-independent velocity (equal to c if the wires are in vacuum), with 
voltage ^and current / (see Fig. above) related as #(x,t)/I(x,t) = ±Z, where Z is 
a frequency-independent constant ("wave impedance"). 




5.8 . Now consider a similar line terminated, at one end, with an impedance-matching resistor R 
= Z. Find variance (V 2 ) Av of voltage across the resistor, and discuss the relation between the result and 
the Nyquist theorem (5.81). 

Hint: Take into account the fact that resistor R = Z absorbs incident TEM waves without 
reflection. 

5.9 . An overdamped classical ID particle escapes from a 
potential well with a smooth bottom, but a sharp edge - see Fig. 
on the right. Find the appropriate modification of the Kramers 
formula (139). 

5.10 . A particle may occupy any of N similar sites, and jump from that site to any other one 
classically (i.e., without quantum-mechanical coherence between the jumps), with the same rate T. Find 
the correlation function and spectral density of fluctuations of the instant occupancy n(t) (equal to either 
1 or 0) of any particular site. 
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Chapter 6. Elements of Kinetics 

This chapter gives a brief introduction to the basic notions of physical kinetics. Its main focus is on the 
Boltzmann equation, especially within the relaxation-time approximation, which allows, in particular, 
an approximate but reasonable and simple description of transport phenomena (such as the electric 
current and thermoelectric effects) in gases, including electron gases in metals and semiconductors. 

6.1. The Liouville theorem 

Physical kinetics is the branch of statistical physics that deals with systems out of 
thermodynamic equilibrium. Major tasks of kinetics include: 

(i) for autonomous systems (those out of external fields): transient processes {relaxation) leading 
from an arbitrary initial state of a system to the thermodynamic equilibrium; 

(ii) for systems in time-dependent external fields (say, in a sinusoidal "ac" field): the periodic 
oscillations of system's parameters; and 

(iii) for systems in time-independent ("dc") external fields: dc transport effects. 

In the last case, we are dealing with stationary (d/dt = 0 everywhere), but non-equilibrium 
situations, in which the effect of an external field, continuously driving the system out of the 
equilibrium, is balanced by the simultaneous relaxation - the trend toward the equilibrium. Perhaps the 
most important effect of this class is the dc current in conductors, which alone justifies the inclusion of 
the basic notions of kinetics into any set of core physics courses. 

Actually, the reader who has reached this point of the notes, already has a good taste of physical 
kinetics, because the subject of the last part of Chapter 5 was the kinetics of a "Brownian particle", i.e. 
of a "heavy" system interacting with environment consisting of many "lighter" components. Indeed, the 
equations discussed in that part - whether the Smoluchowski equation (5.122) or the Fokker-Plank 
equation (5.149) - are valid if the environment is in thermodynamic equilibrium, but the system of our 
interest is not necessarily so. As a result, we could use those equations to discuss such non-equilibrium 
phenomena as the Kramers problem for the metastable state lifetime. 

This chapter is devoted to the more traditional subject of kinetics: a system of very many similar 
particles - generally, interacting with each other, but not too strongly, so that the energy of the system 
still may be partitioned into a sum of the components, with the component interactions considered as a 
weak perturbation. Actually, we have already started the job of describing such a system in Sec. 5.8, in 
the course of deriving the Fokker-Planck equation for a single classical particle. Indeed, in the absence 
of particle interactions (i.e. when it is unimportant whether the particle is light or heavy), the probability 
current densities in the coordinate and momentum spaces are given, respectively, by Eqs. (5.142) and 
(5.143), so that the continuity equation (5.140) takes the form 

^+V 9 -(wq)+V,-(>vp) = 0. (6.1) 

If similar particles do not interact, this equation for single-particle probability density w(q, p, t) is valid 
for each of them, and the result of its solution may be used to calculate any average of the system as a 
whole. 
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Let us rewrite Eq. (1) in the Cartesian component form, 



dw 
~dt 



- + 



^-( w ?j)+^-( w Pj) 
dqj dpj 



(6.2) 



where index j lists all degrees of freedom of the particle, and assume that its motion in an external field 
may be described by a Hamiltonian function &%gj,Pj, t). Plugging into Eq. (2) the Hamiltonian equations 
of motion: 1 



dft 

q J = ^' p J 
d Pj 



dft 

dq i 



(6.3) 



we get 



dw 
~dt 



■ + 



dq. 



w- 



dft 

d Pj 



d Pj 



w- 



dft 



= 0. 



(6.4) 



At the parentheses' differentiation, the mixed terms wd rffdqjdpj and wd ftfdpjdqj cancel, and using Eq. 
(3) again, we get the so-called Loiuville theorem 2 




dw . dw . 

dqj dpj 




(6.5) 



Liouville 
theorem 



Since the left-hand part of this equation is just the full derivative of the probability density 
considered as a function of the generalized coordinates q/t) of a particle, its generalized momenta 
components pj{t), and (possibly) time t, the Liouville theorem (5) may be presented in a surprisingly 
simple form: 



dw(q,y,t) 
dt 



= 0, 



(6.6) 



Physically it means that the probability dW = wd 3 qd 3 p to find a Hamiltonian particle in a small volume 
of the coordinate-momentum space [q, p], with the center moving in accordance to the deterministic law 
(3), does not change with time - see Fig. 1. 




Fig. 6.1. Cartoon representation of the 
Liouville theorem in the 6D space [q, p]. 



1 See, e.g., CM Sec. 10.1. 

2 Actually, this is just one of several theorems bearing the name of J. Liouville (1809-1882). 
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At the first glance, this may not look surprising, because according to the fundamental Einstein 
relation (5.78), one needs non-Hamiltonian forces (such as viscosity) to have diffusion. On the other 
hand, it is striking that the Liouville theorem is valid even for (Hamiltonian) systems with deterministic 
chaos, 3 in which the deterministic trajectories corresponding to slightly different initial conditions 
become increasingly mixed with time. 

For an ideal gas of 3D particles, we may select the usual Cartesian coordinates r, (with j = 1,2, 
3) for the generalized coordinates qj, so that pj become the Cartesian components mvj of the usual 
(linear) momentum, and the elementary volume is just d 3 rd 3 p - see Fig. 1 . In this case Eqs. (3) are just 



• Pj 
m 



so that the Liouville theorem may be rewritten as 



dw 

7=1 



dt 



dw 



P J =? J 



dw 



v. + f,— 

1 dr. 1 dp 



= 0. 



i J 



(6.7) 



(6.8) 



and conveniently presented in the vector form 4 

dw 



+ \ - Vw + f - V n w = 0 . 

dt p 



(6.9) 



where I have returned to using unindexed symbol V for the vector differentiation in the coordinate space. 



6.2. The Boltzmann equation 

The situation becomes much more complex if particles interact. Generally, a system of N similar 
particles in 3D space has to be described by probability density w being a function of 6N + 1 arguments 
(3N Cartesian coordinates, plus 3N momentum components, plus time). Analytical or numerical 

23 

solution of any equation describing time evolution of such a function for a typical ensemble of N ~ 10 
particles is evidently a hopeless task. Hence, kinetics of realistic ensembles has to rely on making 
reasonable approximations with simplify the situation. 

One of the most useful approximation (sometimes called Stosszahlansatz, German for the 
"collision number assumption") was suggested by L. Boltzmann for a gas of particles that move freely 
most of the time, but interact during short time intervals, when a particle comes close to either an 
immobile scattering center (say, an impurity in a conductor) or to another particle of the gas. Such a 
brief scattering event changes particle's momentum, and may be approximately described by the 
addition of a special term (called the scattering integral) to the right-hand part of Eq. (9): 



Boltzmann 
equation 



dw _ „ _ dw\ 
— + \ -Vw + f -V n w = — 

dt p dt 1 



scattering ' 



(6.10) 



while still keeping w a function of only 7 arguments: 3 coordinate components of vector r and 3 
components of momentum p (all of just one particle), plus time t. This is the Boltzmann transport 
equation. 



3 See, e.g., CM Sec. 9.3. 

4 From this point on, I return to using the index-free symbol 
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The concrete form of the scattering integral depends on the scattering object. If scattering centers 
do not belong to the ensemble under consideration (again, for example, an impurity atom in a conductor 
- see Fig. 2), then the scattering integral may be obtained by an evident generalization of the master 
equation (4. 100): 5 



| scatteering 



r p -_> p w(r, p ', t) - r p ^ p ,w(r , p, o 



(6.11) 



where the physical sense of r p ^ p is the rate (i.e. the probability per unit time) for the particle to be 
scattered from the state with momentum p into the state with momentum p '. 



scattering 




Fig. 6.2. Particle scattering event. 



Most elastic interactions are reciprocal, i.e. obey the following relation (closely related to the 
reversibility of time in Hamiltonian systems): r p _> P ' = r p _>. p , so that Eq. (1 1) may be rewritten as 6 

Dw I (• , r -i 

| scatteering = J * p'T^, [w(r, P ', t) - w(r, P , t)\ . (6.12) 

With such scattering integral, Eq. (10) stays linear in w, but becomes an integro-differential equation, 
typically harder to solve than differential equations. 

The equation becomes even more complex if the scattering is due to mutual interaction of the 
particle members of the system (Fig. 3). 



interaction 
region 



Fig. 6.3. Particle-particle scattering event. 



5 Note that the master equations ignores possible quantum coherence of different scattering events, described by 
off-diagonal elements of the density matrix, because w represents only the diagonal elements of the matrix. 
However, for ensembles close to thermal equilibrium, this is a reasonable approximation - see Sec. 2.1. 

6 One may wonder whether this approximation may work for Fermi particles, for whom the Pauli principle forbids 
scattering into the already occupied state, so that for scattering p — > p', factor w(r,p,t) in Eq. (12) has to be 
multiplied by the probability [1 - w(r,p',t)] that the final state is available. Generally, this is a valid argument, but 
one should notice that if this modification has been done with both terms of Eq. (12), it yields 

^Iscatteenng =\^P T > fa> P ' , t)[l ~ W(t , P , t)] ~ w(t , P , t)[l ~ W (V , P ', t)]} . 

at 1 J p->p 

Opening both square brackets, we see that the probability density products cancel, bringing us back to Eq. (12). 
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Relaxation- 
time 

approximation 
(RTA) 



In this case, the probability of the scattering event scales as a product of two single-particle 
probabilities, and the simplest form of the scattering integral is 7 



dw 
~dt 



scatteering 



\ d 'p'\ d ' 



r p ^ p , p ,^ p/ w(r,p,0>v(r,p„0 



(6.13) 



The integration dimensionality in Eq. (13) takes into account the fact that due to the conservation of the 
total momentum at scattering, 



p + p. =p' + p.\ 



(6.14) 



one of the momenta is not an independent argument, so that the integration in Eq. (13) may be restricted 
to a 6D /?-space rather than the 9D one. For the reciprocal interaction, Eq. (13) may also be a bit 
simplified, but it still keeps Eq. (10) a nonlinear integro-differential transport equation, excluding such 
powerful solution methods as the Fourier expansion (which hinges on the linear superposition principle). 

This is why most useful results based on the Boltzmann transport equation hinge on its further 
simplifications, most notably the relaxation-time approximation - RTA for short. This approximation 
uses the fact that in the absence of spatial gradients (V= 0), and external forces (?= 0), Eq. (10) yields 



dw 
~dt 



dw I 

| scattering 



(6.15) 



so that the thermally-equilibrium probability distribution wo(r,p,0 has to turn any scattering integral into 
zero. Hence at small deviations from the equilibrium, 

w(r,p,0 - w(r,p,0-w 0 (r,p,0 -> 0, (6.16) 
the scattering integral should be proportional to the deviation w , and its simplest reasonable model is 

(6.17) 



dw\ 


W 


^ | scatteering 


r 



where r is a phenomenological constant (which, according to Eq. (15), has to be positive for system's 
stability) called the relaxation time. Its physical meaning will be more clear in the next section. 

The relaxation-time approximation is quite reasonable if the angular distribution of the scattering 
rate is dominated by small angles between vectors p and p ' - as it is, for example, for the Rutherford 
scattering by a Coulomb center. 8 Indeed, in this case the two functions w, participating in Eq. (12) are 
close to each other, so that the loss of the second momentum argument (p') is not too essential. 
However, while using the Boltzmann-RTA equation, which results from combining Eqs. (10) and (17), 



Boltzmann- 
RTA 
equation 



dw _ „ „ w 
— + v-Vw + f-V n w = 

dt p x 



(6.18) 



the reader should always remember this is just an approximation, sometimes giving completely wrong 
results. For example, it prescribes the same time scale, r, to the relaxation of the net momentum of the 



7 This was the approximation used by L. Boltzmann to prove the famous H-theorem, stating that entropy of the 
gas described by Eq. (13) may only grow (or stay constant) in time, dSldt > 0. Since the model is very 
approximate, that result does not seem too fundamental nowadays, despite all its historic significance. 

8 See, e.g., CM Sec. 3.7. 
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system, and to its energy relaxation, while in many real systems the latter process (that requires inelastic 
interactions) may be substantially longer. Naturally, in the following sections I will describe only those 
applications of the RTA approximation that give a reasonable description of reality. 



6.3. The Ohm law and the Prude formula 

Despite its shortcomings, Eq. (18) is adequate for quite a few applications. Perhaps the most 
important of them is deriving the Ohm law for dc current is a gas of charged particles, whose only 
important deviation from ideality is the scattering in the form of Eq. (17), and hence described, in 
equilibrium, by the equilibrium probability w$ of an ideal gas (see Sec. 3.1): 

w 0 (r,p,t) = -^(N{e)), (6.19) 

where g is the degeneracy factor (say, g = 2 for electrons due to their spin), and (N(e)) is the average 
occupancy of a quantum state with momentum p, that obeys either the Fermi-Dirac or the Bose-Einstein 
distribution: 



(Up to a point, our calculations will be valid for both statistics, and hence, in the limit /u/T — > -qo, for a 
classical gas as well.) 

Now let a uniform, dc electric field & be applied to the gas, exerting force ?= q£on each particle 

with electric charge q. Then the stationary solution to Eq. (18), with dldt = 0, should also be stationary 
and spatially-uniform (V = 0), so that this equation is reduced to 

qfi.yj p w = -— . (6.21) 

T 

Let us assume the electric field to be relatively low as well, so that the perturbation w it produces is 
relatively small. (I will quantify this condition later on.) Then in the left-hand side of Eq. (21) we can 
neglect that perturbation, by replacing w with wo, because that side already has a small factor {£). As a 
result, this equation yields 

w = -Tq#-V p w 0 =-Tq#- (V^)-^ 1 , (6.22) 

where the partial derivative sign marks the implied local constancy of parameters ju and T, i.e. their 
independence of momentum p. But gradient V p s is nothing else than particle's velocity v - for a 
quantum particle, its group velocity. 9 (This fact is easy to verify for the isotropic and parabolic 
dispersion law, pertinent to classical particles moving in free space, 

2 2,2,2 

4») = f - A+ f» +P > - (6.23) 
2m 2m 



9 See, e.g., QM Sec. 2.1. 
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Indeed, in this case the Cartesian components of vector V p s are 

ds Pj 



dp j 



m 



= v j> 



(6.24) 



so that V p s = v.) Hence, Eq. (22) may be rewritten as 



w = -Tqo- v 

ds 



(6.25) 



Let us use this result to calculate the electric current density j. The contribution of each quantum 
state to the current density is q\w, so that the total density is 



j = ^qvwd 3 p = gj" y{w 0 + w)d 3 p 



(6.26) 



Since in the equilibrium state (with w = Wq), the current has to be zero, integral of the first term in the 
parentheses has to vanish. For the integral of the second term, plugging in Eq. (25), and also using Eq. 
(19), we get 



Sommerfeld 
theory's 
result 





V ds ) (2nhf S V J 


( d( N (sy 

v d£ J 


d 2 p ± dp< 1 , 



(6.27) 



where d p± is the elementary area of the constant energy surface in the momentum space, while dp | is the 
momentum differential's component normal to that surface. This result 10 is valid even for particles with 
an arbitrary dispersion law s(p) (that may be rather complicated, for example, for particles moving in 
space-periodic potentials 11 ), and may give, in particular, a fair description of conductivity's anisotropy 
in crystals. 

For classical particles whose dispersion law is isotropic and parabolic, as in Eq. (23), the 
constant energy surface is a sphere of radius p, so that d 2 p± = p 2 dfl = p 2 smOdOdcp, while dp\\ = dp. In 
spherical coordinates with the polar axis direction along vector 3, we get (<£v) = # vcosft Now 

separating vector v outside the parentheses into a component vcos^ directed along vector <£, and two 
perpendicular components, vsin6cos<p and vsin6fein<p, we see that the integrals of the last two 
components over angle q> give zero. Hence, as we could expect, in the isotropic case the net current is 
directed along the electric field and obeys the linear Ohm law, 12 

i = cr£, (6.28) 



with a field-independent electric conductivity 



a = 



2 2tt 71 CO 

gq T f^fsin&/6>cos 2 0\ p 2 dpv 2 



# 1 

ds 



(6.29) 



10 First obtained by A. Sommerfeld in 1927. 

11 See, e.g., QM Sees. 2.7, 2.8, and 3.4. In this case, p should be understood as the quasi-momentum rather than 
genuine momentum. 

12 As Eq. (27) shows, if the dispersion law s(p) is anisotropic, the direction of current density may be different 
from that of the electric field. In this case, conductivity should be described by a tensor ay; rather than a scalar. 
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Using the fact that sin^^is just -d(cos0), we see that the integral over 6 equals (2/3). The integral over 
dip is of course just In, while that over p may be readily transformed to one over particle's energy s(p) 



p 12m: vdp = pdplm 
conductivity equals 



2 2 2 1/2 3 1/2 

ds, so that p dpv = p vds = {2ms){2slm) ds = (8ms) ds. As a result, the 



cr = 



An 



{2xtif 3 



oo 

|(8^ 3 ) 1/2 



8(N(i 



ds 



ds 



(6.30) 



Note that cr is proportional to q and hence does not depend on the particle charge sign; this is why the 
Hall effect in external magnetic field, which lacks this ambivalence, is typically used to determine the 
charge of current carriers (electrons or holes) in semiconductors. 

So far, the calculations have been valid for any gas (Bose, Fermi, or classical), an arbitrary 
temperature. Let us work out the remaining integral over energy for the most important case of a 
degenerate Fermi (say, electron) gas, with T« £p. 13 As was discussed in Sec. 3.3, in this limit, factor (- 
d(N{s)))lds) is essentially Dirac's delta- function d\s - s F ), so that the conductivity does not depend on 
temperature: 14 



gq 2 T 4x ( 3 \U2 q 
<J= , s , — [Sms F j = — 
{2xhf 3 V f! i 



g 



An 



(2xhf 3 



( 2m£ Y /2 = i!l S 4*P 
1 F) m ( 2 nhf 3 



(6.31) 



But the last fraction in this product is just the volume of the Fermi sphere in the momentum space, so 
that the product of the last two fractions is the total number of quantum states filled at T = 0 (per unit 
volume), i.e. the total density n = N/V of electrons in the gas. Hence, Sommerfeld's result is reduced to 
the Drude formula, 15 



cr = 



2 

q T 
m 



(6.32) 



Drude 
formula 



which should be well familiar to the reader from an undergraduate physics course, with r being a scale 
of time intervals between scattering events. 

This calculation poses with an important conceptual question. The very structure of Eq. (30) 
implies that the only quantum states contributing to electric conductance are those where the derivative 
(-d{N(s))l ds) is significant. At T « sp, these are the states at the very surface of the Fermi sphere's 
surface. On the other hand, the classical derivation of Eq. (32) involves all electrons. 16 So, what exactly 
electrons are responsible for conductance: all of them, or only those on the Fermi surface? 



13 Calculations for a classical gas (which are important, in particular, for most plasmas and non-degenerate 
semiconductors) are left for the reader - see the first assignment of Problem 2. 

14 At least explicitly, because in some particle collision models, r may be a function of temperature, which levels 
out only at some temperature much lower than Ep. 

15 Its was derived in 1900 by P. Drude. Note that Drude also used the same arguments to derive a very simple 
(and very reasonable) approximation for the complex electric conductivity in the ac field of frequency co\ o((o) = 
o(0)/(l - icor), with a(0) given by Eq. (32); sometimes the name "Drude formula" is used for this expression 
rather than for Eq. (32) - see Problem 1. 

16 As a reminder, here it is (see also EM Sec. 4.2): Let rbe the average time at which scattering causes a particle 
to loose all the deterministic component of its velocity, Vdnft, provided by electric field £, on the top of electron's 
random thermal motion (which does not contribute to the net current). Using the 2 nd Newton law to describe 
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For the resolution of this paradox, let us return to Eq. (22) and analyze the physical meaning of 
that result. For that, let us compare it with the following model distribution: 

^modei =w 0 (r,-p-p,t), (6.33) 

where p is some constant, small vector, which describes a small shift of the unperturbed distribution wq 
in the momentum space as a whole. Performing the Taylor expansion of Eq. (33) in this small parameter, 
and keeping only two leading terms, we get 

^modei * w 0 (r,p,0 + w model , vv model = -p • V p w 0 (r,p,t) . (6.34) 

Comparing the model perturbation with the first form of Eq. (22), we see that they coincide, provided 
that 

p = q#z = ?T. (6.35) 

This means that Eq. (22) describes a small shift of the equilibrium distribution of electrons by qEr (in 
/(-space) along the direction of electric field, 17 and gives the picture of the electron transport in a 
degenerate gas, shown in Fig. 4. 




Fig. 6.4. Filling of momentum states in a 
degenerate electron gas: (a) in the 
absence and (b) in the presence of 
external electric field £. Arrows show 
representative scattering events. 



At & = 0, the system is in equilibrium, so that the quantum states inside the Fermi sphere (p < 
Pf), are occupied, while those outside of it are empty (Fig. 4a). Electron scattering events happen only 
between states within a very thin layer (\p 12m - £p| ~ T) at the Fermi surface, because only in this layer 
the states are partially occupied, so that both components of the product w(r,p,0[l - w(r,pV)], 
mentioned in Sec. 1, do not vanish. These scattering events, on the average, do not change the 
equilibrium probability distribution, because they are uniformly spread over the Fermi surface. 

In the instant the electric field has been turned on, it starts to accelerate all electrons in its 
direction, i.e. the whole Fermi sphere starts moving in the momentum space, along the field's direction 



particle's acceleration by the field, dy^ldt = q£lm, we get (Vdnft) = rq&lm. Multiplying this result by the particle 
charge q and density n = NIV, we get the Ohm law j = cvf, with a given by Eq. (32). 

17 By the way, since the scale of the fastest change of w 0 in the momentum space is of the order of dwtjdp = 
(dwo/ds)(ds/dp) ~ (l/7)v F , the linear approximation (34) is valid if e^r « T/v F , i.e. if e<% « T, where / = v F r is 
called the mean free path. This is the promised quantitative condition of the electric field smallness; since the left- 
hand part of the last inequality is just the average energy given to the particle by the electric field between two 
scattering events, the condition may be interpreted as the smallness of electron gas' "overheating" by the applied 
field. However, another condition is also necessary - see the last paragraph of this section. 
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in the real space. For elastic scattering events (with \p'\ = \p\), this creates an addition of occupied states 
at the leading front of the accelerating sphere, and an addition of free states on its trailing edge (Fig. 4b). 
As a result, now there are more scattering events bringing electrons from the leading edge to the trailing 
edge of the sphere than in the opposite direction. This creates the average backflow of states occupancy 
in the momentum space. These two trends eventually cancel each other, and the Fermi sphere 
approaches a stationary (though not equilibrium!) state, with the shift (35) relatively to its thermal- 
equilibrium position. 

Thus Fig. 4b presents a clear answer to the question which of the two different interpretations of 
the Drude formula is correct, and due to electrons' indistinguishability, the answer is: either. On one 
hand, we can look at the electric current at a result of shift (35) of all electrons in the momentum space. 
On the other hand, each filled quantum state deep inside the sphere gives exactly the same contribution 
into the net current density as it did without the field. All these internal contributions to the net current 
cancel each other, so that the applied field changes the situation only at the Fermi surface. Thus it is 
equally legitimate to say that only the surface states are responsible for the nonvanishing net current. 18 

Let me also mention the second paradox related to the Drude formula, which is often 
misunderstood (not only by students :-). As was emphasized above, t is finite even at elastic scattering - 
that by itself does not change the total energy of the electron gas. The question is how can such 
scattering may be responsible for Ohmic resistivity p = l/o, and hence for the Joule heat production, 
with power density 7^V = y£ = pj ? The answer is that the Drude and Sommerfeld formulas describe 
just the "bottleneck" of the Joule heat formation. In the scattering picture (Fig. 4b) the elastically 
scattered electron states are predominantly located above the (shifted) Fermi surface, and eventually 
need to relax onto it via some inelastic process that releases their additional energy in the form of heat 
(in solid state materials, described by phonons - see Sec. 2.6). The rate and other features of these 
inelastic phenomena do not participate in the Drude formula directly, but for keeping the theory valid (in 
particular, keeping the probability distribution w close to its equilibrium value wo), their intensity has to 
be sufficient to avoid gas overheating by the applied field. This gives an additional restriction on the 
simple theory described above. In some semiconductors, the charge carrier overheating effects, resulting 
in deviations from the Ohm law, i.e. from the linear relation (28) between j and &, may be readily 
observed already at rather modest applied electric fields. 

6.4. Electrochemical potential and the drift-diffusion equation 

Now let us generalize our calculation to the case when transport takes place in the presence of a 
time-independent spatial gradient of the probability distribution, Vw ^ 0, caused for example by that of 
the particle concentration n = N/V (and hence, according to Eq. (3.40), of the chemical potential /u), 
while still considering temperature T constant. For this generalization, we should just keep the second 
term in the left-hand part of Eq. (18). If the gradient of w is sufficiently small, we can repeat arguments 
of the last section and replace w with wq in this term as well. With the applied electric field & presented 

as (-V0), where <j> is the electrostatic potential, Eq. (25) now becomes 



18 So here, as it frequently happens in physics, formulas (or drawings, such as Fig. 4b) give a more clear and 
unambiguous description of the reality than words - the privilege lacked by many other scientific (and 
"scientific") disciplines, frequently leading in unending, shallow verbal debates. 
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Electro- 
chemical 
potential 



W = T V • 



8w, 



ds 



-qV<f>-Vw 0 



(6.36) 



Since in any of distributions (20), (N(s)) is a function of a and /u only in combination (s - //), it obeys the 
following relation, 

d(N(e)) _ d(N(e)) 
BjU ds 



(6.37) 



Using this relation, the gradient of w 0 <x (N(s)) may be presented as 19 

dw n 



so that Eq. (26) becomes 



where the following sum, 



8w, 



Vju, for T = const , 



dw n 



w ■ 



OS OS 



O = </> + 



E 



(6.38) 



(6.39) 



(6.40) 



is called the electrochemical potential. 213 Now repeating the calculation of the electric current, carried 
out in the last section, we get the following generalization of the Ohm law (28): 

j = o-(-VO)=o-£, (6.41) 

where the effective electric field £ is the (minus) gradient of the electrochemical potential, rather of the 
electrostatic potential: 



Effective 
electric 
field 



-VO = £ 



Ye 



(6.42) 



The physics of this extremely important result 21 may be explained in two ways. First, let us have 
a look at the energy spectrum of a uniform, degenerate Fermi gas confined in a volume of finite size. In 
order to ensure such a confinement, we need a piecewise-constant potential U(r) - a "hard-wall, flat- 
bottom potential well" - see Fig. 5a. (In a solid conductor, such profile is readily provided by the crystal 
lattice of positively charged ions of the crystal lattice.) The well should be of a sufficient depth Uo > 
= /u\t=o in order to provide the confinement of the overwhelming majority of the particles, with energies 
below or slightly above the Fermi level Sp. This means that there should be a substantial energy gap, 



19 Since we consider w 0 as a function of two independent arguments r and p, taking its gradient, i.e. 
differentiation of this function over r, does not involve its differentiation over the kinetic energy s - which is a 
function of p only. 

20 In electronic engineering literature, variable q<$> = ju + qtf> , called the local Fermi level, is more frequently used. 

21 The fact that Eq. (42) does not include the phenomenological parameter rof the relaxation-time approximation, 
infers that this relation is more general than the RTA. This is certainly true, because Eq. (42) is based on the 
relation between the second and third terms in the left-hand part of the rather general Eq. (10). 
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y/ =U 0 - /j»T , 



(6.43) 



between the Fermi energy of a particle inside the well, and its potential energy outside the well. (The 
latter value is usually called the vacuum level.) The difference defined by Eq. (43) is called the 
workfunction; 22 for most metals, its is between 4 and 5 eV, so that relation y/» T is well fulfilled for 
the room temperatures (T ~ 0.025 eV) - and actually for all temperatures up to material's evaporation 
point. 






\ -> 

> 


d 




\ 













(b) 



(c) 



Ay/ 



Fig. 6.5. Potential profiles of (a) a single conductor and (b,c) a system 
of two closely located conductors, for two different biasing conditions: 
(b) zero electrostatic field ("flat-band"), and (c) zero voltage V= AO. 



Now let us consider two conductors, with different values of y/, separated by a small gap d - see 
Fig. 5b,c. Panel (b) shows the case when the electric field £ = - in the free-space gap between the 
conductors equals zero, i.e. their electrostatic potentials 0 are equal. 23 If there is an opportunity for 
particles to cross the gap (e.g., by either the thermally-activated hopping over the potential barrier, 
discussed in Sees. 5.6-5.7, or quantum-mechanical tunneling through it), there will be an average flux of 
particles from the conductor with the higher Fermi level to that with the lower Fermi level, 24 because the 
chemical equilibrium requires their equality - see Sees. 1.5 and 2.7. If the particles have an electric 
charge (as electrons do), the equilibrium will be automatically achieved by them recharging the effective 
capacitor formed by the conductors, until the electrostatic energy difference qA</> reaches the value 
reproducing that of the workfunctions (Fig. 5c). According to Eq. (43), at the recharging, sum (y/ + ju) 
of each conductor has to stay constant, so that for the equilibrium potential difference 25 we may write 

qA0 = A y/ = -Aju . (6.44) 



At this equilibrium, the electric field in the gap between the conductors is 

Aju 
qd 



. A(j> Aju Vju 
3 = — n = — — n = — — ; 



(6.45) 



22 Sometimes also called the "electron affinity", though the latter term is mostly used for atoms and molecules. 

23 In semiconductor physics and engineering, the situation shown in Fig. 5b is called the flat-band condition, 
because in semiconductors, any electric field at the surface leads to band bending - a gradual spatial change of the 
background potential U 0 and hence of all energy band/gap edges. For a discussion of the band bending and its 
effects on semiconductor device operation, see, e.g., either Chapter 6 in J. Hook and H. Hall, Solid State Physics, 
2 nd ed. Wiley, 1991, or Chapter 3 in S. Sze, Semiconductor Devices, 2 nd ed., Wiley, 2001. 

24 As measured from a common reference value, for example from the vacuum level. 

25 In physics literature, it is usually called the contact potential difference, while in electrochemistry (for which it 
is one of the key notions), the term Volta potential is more common. 
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in Fig. 5c the field is clearly visible as the tilt of the electric potential profile. Comparing Eq. (45) with 
definition (42) of the effective electric field £, we see that the transport equilibrium, i.e. the absence of 
current, is achieved exactly when £ = 0, in accordance with Eq. (41). 

Another interpretation of Eq. (41) may be achieved by modifying Eq. (38) for the particular case 
of a classical gas. Indeed, the gas' local density n = N/V obeys Eq. (3.32), which may be presented as 

n(r) = const x expj^^-j . (6.46) 

Taking the spatial gradient of the both parts of this relation (at constant 7), we get 

Vn = const x ^expj^jv// = ^V// , (6.47) 

so that V/u = (T/n)Vn, and Eq. (41), with cr given by Eq. (32), may be recast as 

.2_ f , \ 



m 



r(-VO) = -^-n V// =q — (nq£-TVn). (6.48) 

1 ) 



v 



m 



Hence the current may be viewed as consisting of two independent parts: one due to the "usual" electric 
field = and another due to the particle diffusion - see Eq. (5.118) and its discussion. This is 

exactly the physics of the "mysterious" term V// in Eq. (42), though it may be presented in the simple 
form (48) only in the classical limit. 

Besides being very useful for practice, 26 Eq. (48) gives us a pleasant surprise. Namely, plugging 
it into the continuity equation for electric charge, 

^ + V.j = 0, (6.49) 

dt 

we get (after the division of all terms by qxlm) the so-called drift-diffusion equation-? 1 



Drift- 
diffusion 
equation 



™?l = V( n VU) + r7 2 n, with U = qj. 
r dt 



(6.50) 



Comparing it with Eq. (5.122), we see that the drift-diffusion equation is identical to the Smoluchowski 
equation, if we identify ratio rim with mobility /u m = \/rj: 

T 1 

— 0^=-, (6-51) 
m 7] 

and hence the following combination, rT/m, with the diffusion constant D - see (5.78). As a result, Eq. 
(48) is frequently rewritten as an expression for the particle flow density j„ = nj w = j/q: 

3 n =n Mm q#-DVn. (6.52) 



26 In particular, in physics of semiconductor devices, where electrons in the conduction band, and holes in the 
valence band, may be frequently treated as nearly-ideal classical gases. 

27 Sometimes this term is associated with Eq. (52). 
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This similarity may look surprising. Indeed, our (or rather Einstein's :-) treatment of the 
Brownian motion in Chapter 5 was based on a strong hierarchy of the total system, consisting of a large 
"Brownian particle" in an environment of many smaller particles - "molecules". On the other hand, in 
this chapter we are considering a gas of similar particles. Nevertheless, the equations describing the 
dynamics of their probability distribution, are the same - at least within the framework of the Boltzmann 
transport equation with the relaxation-time approximation (17) of the scattering integral. 

The origin of this similarity is that Eq. (12) is applicable to Brownian particles as well, with each 
"scattering" event being the particle's hit by a random molecule. Since, due to the mass hierarchy, the 
particle momentum change at each such event is small, the scattering integral has to be local, i.e. depend 
only on w at the same momentum p as the left-hand part of the Boltzmann equation, so that the 
relaxation time approximation (17) is absolutely natural. But the same is true for collisions of similar 
particles, if they are dominated by small-angle scattering, as true, for example, for Coulomb scattering. 28 

Returning to the electric field duality (£ <-> &), recovered in our analysis, it raises a natural 
question: which of these fields we are speaking about in the everyday and laboratory practice? Upon 
some contemplation, the reader should agree that most of our electric field measurements are done 
indirectly, by measuring corresponding voltages - with voltmeters. A vast majority of these instruments 
belong to the electrodynamic variety that is based on the measurement of a small current flowing 
through the voltmeter. As Eq. (41) shows, electrodynamic voltmeters measure the electrochemical 
potential difference AO. However, there exist a rare breed of electrostatic voltmeters (also called 
"electrometers") that measure the electrostatic potential difference Atp between two conductors. One 
way to implement such an instrument is to use a usual, electrodynamic voltmeter, but with the reference 
point set at the flat-band condition (Fig. 5b) between the conductors. This condition may be detected by 
vanishing electric charge on the adjacent surfaces of the conductors, and hence by the absence of its 
modulation in time, caused by a specially arranged periodic variation of the distance between the 
surfaces. Another (less sensitive but also less invasive) way to detect the flat-band condition is to 
measure the voltage at which the force of electrostatic interaction between two conductors, which is 

2 2 

proportional to 3 <x (A0) , vanishes. 



6.5. Thermoelectric effects 



Now let us extend our analysis even further, to the effects of a finite (though small) temperature 
gradient. Again, since for any of statistics (20), the average occupancy (N(e)) is a function of just one 
combination of all its arguments, £, = (s- ju)/T, its partial derivatives obey not only Eq. (37), but also the 
following relation: 



d(N(s 
dT 



ju 



d(N(e)) 



d(N(e) 



As a result, Eq. (38) is generalized as 



Vw n =- 



ds 



Vju + 



dju 



JU 



vr 



(6.53) 



(6.54) 



28 See, e.g., CM Sec. 3.7. 
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giving the following generalization of Eq. (39): 



W = T- 



ds 



f s-u ^ 



(6.55) 



Seebeck 
coefficient 



Now, repeating the current density calculation, we get a result which is traditionally presented as 

j = cr(-VO) + (7£(~Vr), (6.56) 

where constant 5*, called the Seebeck coefficient 29 (or the "thermoelectric power", or just 
"thermopower") is defined by the following relation: 

(6.57) 

Working out this integral for the most important case of a degenerate Fermi gas, with T « £ F , 
we have to be careful, because the center of the sharp peak of the last factor under the integral coincides 
with the zero point of the previous factor, (s - ju)/T. This uncertainty may be resolved using the 
Sommerfeld expansion formula (3.59). Indeed, for a smooth function f{s) defined by Eq. (3.60), so that 
flO) = 0, we may use (3.61) to rewrite the formula as 




ds 



J 



r 6 ds 1 |£ M 



(6.58) 



In particular, for integral (57), we may take f{s) = (Sms 3 ) 1 ' 2 ^ - ju)IT. (Evidently, for this function, 
condition/O) = 0 is satisfied.) Then///) = 0, cfflds 1 ^ = 3(8m//) 1/2 /r* 3(Sms F ) 1/2 /T, and Eq. (57) yields 



gqr 4k n 1 ^ 2 ^ms F ) ' 



{2nhf 3 6 T 

Comparing the result with Eq. (31), for constant we get a simple expression independent of r? Q 



(6.59) 



S 



2 rr 

n T 



V 



(6.60) 



2q e F 

where c v = Cy/N is the heat capacity of the gas per unit particle, given by Eq. (3.70). 

In order to understand the physical meaning of the Seebeck coefficient, it is sufficient to consider 
a conductor carrying no current. For this case, Eq. (56) yields 



Seebeck 
effect 



V(O + 5T) = 0. 



(6.61) 



29 Named after T. Seebeck who experimentally discovered, in 1821 (independently of J. Peltier), the effect 
expressed by Eq. (62). 

30 Again, such independence infers that Eq. (60) should have a broader validity than in our simple model of an 
isotropic gas. This is indeed the case: at T« Sp, this result turns out to be valid for any form of the Fermi surface, 
and for any dispersion law s(p). 
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Thus, the temperature gradient creates the oppositely directed gradient of the effective electric 
potential O, i.e. the effective field £ defined by Eq. (42). This is the Seebeck effect. Figure 6 shows the 
standard way of its measurement, using a usual (electrodynamic) voltmeter that measures the difference 
of potentials O, and a connection (in this context, called thermocouple) of two different materials, with 
different coefficients S! Integrating Eq. (61) around the loop from points to point B, and neglecting the 

temperature drop across the voltmeter, we get the following simple expression for the thermally-induced 
difference of the electrochemical potential, frequently also called the either the thermoelectric power or 
"thermo e.m.f": 



^ = O b -O a = JvO- dr = -^9VT-dr = -£, jVT ■ dr-S 2 \VT ■ dr + ^VT ■ dr 

A A A' 

= -S x {T" - T) - 9 2 (r - T") = - S 2 ){T - T") . 



Va" 



(6.62) 



(Note that according to Eq. (62), any attempt to measure such voltage across any two points of a uniform 
conductor would give results depending on the voltmeter lead materials, due to the unintentional 
gradient of temperature in them.) 




Using thermocouples is a popular, inexpensive method of temperature measurement - especially 
in the few-hundred-°C range (where gas- and fluid-based thermometers are not too practicable), if a 
l°C-scale accuracy is sufficient. The "responsivity" (Si - Si) of a typical popular thermocouple, 
chromel-constantan, 31 is about 70 u.V/°C. In order to understand why typical values of Sare so small, let 
us discuss Seebeck effect's physics. Superficially, it is very simple: particles, heated by an external 
source, diffuse from it toward the colder parts of the conductor, carrying electrical current with them if 
they are charged. However, this naive argument neglects the fact that at j = 0, there should be no total 
flow of particles. For a more accurate interpretation, note that the Seebeck effect is described by the 
factor {s - ju)/T in integral (57), which changes sign at the Fermi surface, i.e. at the same energy where 
the term (-d(N{s))lds), describing the state availability for transport (due to their intermediate occupancy 
0 < (N{s)) < 1), reaches its peak. The only reason why that integral does not vanish completely, and 
hence S ^ 0, is the growth of first factor under the integral (which describes the number of available 

quantum states) with a, so the hotter particles (with s > /u) are more numerous and carry more heat then 
the colder ones. 



31 Both these materials are alloys, i.e. solid solutions: chromel is 10% chromium in 90% nickel, while constantan 
is 45% nickel and 55% copper. 
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The Seebeck effect is of course not the only result of temperature gradient; the same diffusion of 
hotter particles also causes a flow of heat from the region of higher T to those with lower T, i.e. the 
effect of thermal conductivity, well known from our everyday practice. The heat (i.e. thermal energy) 
flow density may be calculated similarly to that of the electric current - see Eq. (26), with the natural 
replacement of the electric charge q of each particle with the thermal energy (s - ju) of its state: 32 



(6.63) 



Again, at equilibrium (w = wo) the heat flow vanishes, so that w may be replaced with its perturbation 
w , which already has been calculated - see Eq. (55). The substitution of that expression into Eq. (63), 
and its transformation exactly similar to the one perform above for the electric current j, yields 



I =on(-vo)+K-(-vr), 



with coefficients IT and k defined by equalities 



Peltier 
coefficient 
and thermal 
conductivity 



K ■ 



gr An 
{2nhf T 



oo 

\(Sms 3 ) 



1/2 (S - JU) 



V 
2 f 



#)) ' 

ds 
ds 



ds . 



ds . 



(6.64) 



(6.65) 



(6.66) 



Besides the missing factor T in the denominator, integral in Eq. (65) is the same as in Eq. (57), so that 
constant IT (called the Peltier coefficient), is simply and fundamentally related to the Seebeck 
coefficient: 33 



n vs. s 



n = 5T 



(6.67) 



On the other hand, integral (66) may be readily calculated, for the most important case of a 
degenerate Fermi gas, using the Sommerfeld expansion (58) with. f[s) = (8m^) 1/2 (£ - juf/T, for which 
f[ju) = 0 and d 2 fIds 2 \ l ^ M = 2(8mrf) m /T* 2(Sms F 3 ) m /T, so that 



K = 



gr An n 1 t2 2(8msl) m 



(2nhf 3 6 T 

Comparing the result with the first form of Eq. (31), we get the so called Wiedemann-Franz law 34 



(6.68) 



32 One more way to look at Eq. (63) is as at the difference between the total energy flow density, j e = lsvwd 3 p, and 
the product of a constant (//) by the particle flow density, j„ = \\wd 3 p =j/q. 

33 The simplicity of this relation (first discovered experimentally in 1854 by W. Thompson, a.k.a. Lord Kelvin) is 
not occasional. This is one of fundamental Onsager reciprocal relations between kinetic coefficients (L. Onsager, 
1931), which are model-independent, i.e. valid within very general assumptions. Unfortunately, I have no time 
left for a discussion of this interesting topic, and have to refer the interested reader, for example, to Sec. 120 in L. 
Landau and E. Lifshitz, Statistical Physics, 3 rd ed., Pergamon, 1980. Note, however, that the range of validity of 
the Onsager relations is still debated - see, e.g., K.-T. Chen and P. Lee, Phys. Rev. B19, 18 (2009). 

34 It was named after G. Wiedemann and R. Franz who noticed the constancy of ratio fd a for various materials, at 
the same temperature, as early as in 1853. The direct proportionality of the ratio to the absolute temperature was 
noticed by L. Lorenz in 1872. Due to this contribution, the Wiedemann-Franz law is frequently presented as kIcj= 
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(6.69) 



Wiedemann- 

Franz 

law 



This relation between the electric conductivity <j and thermal conductivity k is more general 
than our formal derivation might imply. Indeed, it is straightforward to show that the Wiedemann-Franz 
law is also valid for an arbitrary dispersion law anisotropy (i.e. arbitrary Fermi surface shape) and, 
moreover, well beyond the relaxation-time approximation. (For example, it is also valid for scattering 
integral (12) with an arbitrary angular dependence of rate T, provided that scattering is elastic.) 
Experiments show that the law is well obeyed by most metals, but only at relatively low temperatures T 
« T D , when the thermal conductance due to electrons is well above the one due to lattice vibrations, i.e. 
phonons - see Sec. 2.6. (Note also that Eq. (69) is not valid for classical gases - see Problem 2.) 

Now let us discuss the less evident, first term of Eq. (64). It describes the so-called Peltier effect, 
which may be measured in the geometry similar to that shown in Fig. 6, but driven by an external 
voltage source - see Fig. 7. The voltage drives certain dc current / =jA (where A is conductor's cross- 
section area), necessarily the same in the whole loop. However, according to Eq. (64), if materials 1 and 
2 are different, power V= j u A of the heat flow is different in two parts of the loop. Indeed, if the whole 

system is kept at the same temperature (VT= 0), integration of the equation over the cross-section yields 

(n, 2 c/*)i,2 (<*4 



n,2 



n 



er 



1.2 ■ 



1.2 



This means that in order to sustain the constant temperature, the following power difference, 



A^ = (n 1 -n 2 )/, 



(6.70) 



(6.71) 



Peltier 
effect 



has to be extracted from one junction of the two materials, and inserted into another junction. If a 
constant temperature is not maintained, the former junction is heated, while the latter one is cooled (on 
the top of the bulk, Joule heating), thus implementing a thermoelectric heat pump I refrigerator. 




7. The Peltier effect. 



LT, where constant L, called the Lorenz number, in SI units is close to 2.45x1 0" 8 W-Q/K 2 . Theoretically, Eq. (69) 
was derived in 1928 by A. Sommerfeld. 
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Such refrigerators, with no moving parts and gas/fluid materials, are very convenient for modest 
(by a few tens °C) cooling of relatively small components of various systems - from sensitive radiation 
detectors in spacecraft, all the way to cold drinks in vending machines. It is straightforward to use above 
formulas to show that the efficiency of active materials used in such thermoelectric refrigerators may be 
characterized by the following dimensionless figure-of-merit, 



For the best thermoelectric materials found so far, ZT is in the range from 2 to 3, providing the 
coefficient of performance, defined by Eq. (1.69), of the order of 0.5 - a few times lower than that of 
traditional, mechanical refrigerators. The search for composite materials (including those with 
nanoparticles) with higher values of ZT is one of very active fields of applied solid state physics. 35 

Let me finish this chapter (and this lecture note series) by emphasizing again that due to time 
restrictions I was able to barely scratch the surface of physical kinetics. A much more detailed coverage 
of this important part of physics may be found, for example, in the textbook by L. Pitaevskii and E. 
Lifshitz, Physical Kinetics, Butterworth-Heinemann, 1981. 



6.1 . Use the relaxation-time approximation of the Boltzmann equation to prove the Drude 
formula for complex conductivity at frequency co, 



where o(0) is the dc conductivity given by Eq. (6.30) of the lecture notes, and give a physical 
interpretation of the formula. 

6.2 . Calculate the electric conductivity a, thermal conductivity k, as well as thermoelectric 
coefficients 5? and II, for a classical, ideal gas of electrically charged particles. Compare the results with 
those for the degenerate Fermi gas, derived in the lecture notes. 



ZT = 



K 



(6.72) 



6.6. Exercises 




35 See, e.g., D. Rowe (ed.), Thermoelectrics Handbook: Macro to Nano, CRC Press, 2005. 
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Selected Mathematical Formulas 

that are used in this lecture course series, 
but not always remembered by students (and some instructors :-) 



Appendix MA 



1. Constants 

Euclidean circle's length-to-diameter ratio: 

7T = 3.141 592 653...; V^«1.77. 

Natural logarithm base: 

f \\ n 



e = lim. 



1 + 

V nj 



= 2.718 281828... ; 



from that value, the logarithm base conversion factors are as follows: 

lnX =lnlO« 2.303, l0gl0 *- 1 



log I0 x 



\nx In 10 



« 0.434 . 



The Euler (or "Euler-Mascheroni") constant: 



1 1 



1 



1 + - + - + ... + Inn 

2 3 n 



0.5771566490...; e r ~ 1.781. 



(1.1) 



(1.2a) 



(1.2b) 



(1.3) 



2. Combinatorics, sums, and series 

(i) Combinatorics 

- The number of different permutations, i.e. ordered sequences of k elements selected from a set 
of n distinct elements (n>k), is 

"P k S n-(n-l)...,(»-Hl) = -4-; (2.1a) 

(n-ky. 

in particular, the number of different permutations of all elements of the set (n = k) is 



P k =k-(k-l)-...-2-l = k\ . (2.1b) 
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- The number of different combinations, i.e. unordered sequences of k elements from a set of n > 
k distinct elements, is equal to the binomial coefficient 



n 

"C = 
k ~ k 



f ^ n P„ n\ 



_ k 

k 



P k k\(n-k)\ 



(2.2) 



In an alternative, very popular "ball/box language", "C* is the number of different ways to put in a box, 
in an arbitrary order, k balls selected from n distinct balls. 

- A generalization of the binomial coefficient notion is the multinomial coefficient, 

" C kk k s , iTT , , » withn=5>,, (2-3) 
k v k 2 ,...k, k l \k 2 \...k l \ 7^ 1 

which, in the standard mathematical language, is a number of different permutations in a multiset of / 
distinct element types from an n-element set which contains kj (j = 1, 2,. . ./) elements of each type. In the 
"ball/box language", coefficient (2.3) is the number of different ways to distribute n balls between / 
different boxes, each time keeping the number (kj) of balls in the j'-th box fixed, but ignoring their order 
inside the box. The binomial coefficient "C* (2.2), is a particular case of the multinomial coefficient 
(2.3) for / = 2 - counting the explicit box for the first, and the remaining space for the second box, so 
that if ki = k, then k 2 = n-k. 

- One more important combinatorial quantity is the number M n w of ways to place n 
indistinguishable balls into k distinct boxes. It may be readily calculated from Eq. (2.2) as the number of 
different ways to select (k - 1) partitions between the boxes in an imagined linear row of (k - 1 + n) 
"objects" (balls in the boxes and partitions between them): 



" (A I)!,/ 1 



(ii) Sums and series 

- Arithmetic progression: 



•A , n(r + nr) _ s 

r + 2r + ... + nr = 2_ J kr = ^—^ — -; (2.5a) 



k=\ 

in particular, at r = 1 it is reduced to the sum of n first natural numbers 



k=\ 

Sums of squares and cubes of n first natural numbers 



, ~ "sr-i , n(n + 1) 

1 + 2 + ... + n = 2_ J k= - . (2.5b) 



I 2 +2 2 +... + n 2 =}_, k =— ^ > (2-6a) 



k=\ 

2/ , 1\2 



l 3 +2 3 +... + n 3 ^f j k 3 = n (n + 1Y . (2.6b) 



k=\ 

- The Riemann zeta function: 
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C(s) = \ + — + — + ... = f— ; 
2 s 3 s ttk s 



(2.7a) 



the particular values frequently met in applications are 

« 2.612, C(2) = ^-, 1.341, ^(3)« 1.202, C(4) = ^-, ^(5) « 1.037. (2.7b) 

^ 2 ^ 6^2^ 90 



- Finite geometric progression (for real A ^ 1): 

l + A + ^ + .-. + zT 1 = =- 



1 /." 



in particular, if A < 1, the progression has a finite limit at n — > co (called the geometric series): 



n—l co -i 

i-=n i-=n t 



fc=0 k=0 

- Binomial sum (or the "binomial theorem"): 



A 



(i + «)"=X"cy, 



(2.8a) 



(2.8b) 



(2.9) 



*=0 



where "C* are the binomial coefficients defined by Eq. (2.2). 
- The Stirling formula: 

1 11 

lim H . _ ln(nl) = n(ln n - 1) + — ln(2^n) H + . 

W 2 12n 360n 3 



(2.10) 



for most applications in physics, the first term 1 is sufficient. 

- The Taylor (or "Taylor-Maclaurin") series: for any infinitely differentiable function /(.*;): 



lim. , n /(x + x) = f(x) + — (x) x +— ^-4r(x)x 2 + ... = Y — ^—^-(x) x k ; 

2! d* 2 dx k 



(2.11a) 



note that for many functions this series converges only within a limited, sometimes small range of 
deviations x . For a function of several arguments, f(x\,X2,. . -,xn), the first terms of the Taylor series are 

lim~ ~ fix, +X,,X 7 +Xt, ...)=f ~ n+y-^-~ n X t + — V ^ ^ ~~ nXtXf + ... (2.11b) 



k= l UA k * ^. k J c ' = l UA k UA r 

- The Euler-Maclaurin formula, valid for any infinitely differentiable function f(x): 

g/( B , = |/(,,*--/(0,--.--£(0, + -.--^(0) + .., (2,2a) 
the coefficients participating in this formula are the so-called Bernoulli numbers: 1 



1 Actually, this leading term was derived by A. de Moivre in 1733, before J. Stirling's work, but nevertheless is 
also commonly called the Stirling approximation. 
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£,=-, B 2 =-, B 3 =0, B 4 =—, B 5 =0, B 6 =—, B 7 = 0, fl g =—,.... (2.12b) 
1 2 2 6 30 42 7 8 30 

3. Trigonometric functions 

- Sums of two functions of arbitrary arguments: 

„ „ A + B B-A ,„ „ N 

cos A + cosB = 2cos cos , (3.1a) 

2 2 

„„.A+5.5-A . - . 

cos A -cos 5 = 2sin sin , (3.1b) 

2 2 

. „ „ . A + B +B-A ,„ „ , 

sin A ± sine = 2 sin cos . (3.1c) 

2 2 

- Trigonometric function products: 

2cosAcos5 = cos(A + 5) + cos(A - 5) , (3.2a) 

2 sin A cos 5 = sin(A + B) + sin(A - B) , (3.2b) 

2 sin A sin 5 = cos(A -B)- cos(A + 5) ; (3.2c) 

for the particular case of equal arguments, B = A, these formulas yield expressions for squares of 
trigonometric functions, and their product: 

cos 2 A = ^-(cos2A + l), sin 2 A = ^-(l-cos2A), sin A cos A = ^ sin 2 A. (3. 2d) 

- Cubes of trigonometric functions: 

,31 ,31 

cos A = — cos A + — cos3A, sin A = — sin A sin3A. (3.3) 

4 4 4 4 

- Trigonometric functions of the complex argument: 

sin(A + iB) = sin A cosh B + i cos A sinh B, 
cos (A + iB) = cos A cosh B - i sin A sinh B. 



(3.4) 



4. General differentiation 

Full differential of a product of two functions: 

d(fg) = (df)g + f(dg). (4.1) 
Full differential of a function of several independent arguments, f(x\, X2,...,x n ): 

k=\ O x k 



2 Note that definitions of B k (or rather their signs and indices) differ even in the most popular handbooks. 
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Curvature of a Cartesian plot of a ID function fix): 



1 




d 2 f 1 dx 2 




R 


\ + {dfldx) 2 


3/2 



(4.3) 



5. General integration 

Integration by parts - immediately follows from Eq. (4.1): 

g(B) f(B) 

\fdg = fg B A - \gdf. 

g(A) f(A) 

Numerical (approximate) integration of ID functions: the simplest trapezoidal rule, 



(5.1) 



u 

I f(x)dx ~ h 



f 



a H — 

V 2y 



+ / 



a + ■ 



3/v 



+ ••• + / 



V 2y 



= l%y f\a — — + nh 



^ , b-a 
, h = 



N 



(5.2) 



has relatively low accuracy (error of the order of (h 3 /l2)d 2 f/dx 2 per step), so that the following Simpson 
formula, 



u 

\f(x)dx » -[f (a) + 4/(« + h) + If {a + 2h) + ... + 4f(b -h) + f(b)\ h = 



b-a 
IN 



(5.3) 



whose error per step scales as (h 5 /lS0)d 4 f/dx 4 , is used much more frequently. 3 

6. A few ID integrals of elementary functions 4 



(i) Indefinite integrals 



Integrals with(l + f) m : 



d% 



Integrals with (£ z + 2a^-l) m : 



(i + ^r 



= ln 



(6.1) 
(6.2) 



^ 2 +2^-l) 1/: 



= arccos 



a^-l 



* 2 +i)" 2 ' 



(6.3) 



3 Higher-order formulas (e.g., the Bode rule), and other guidance including ready-for-use codes for computer 
calculations may be found, for example, in the popular reference texts by W. H. Press et ah, cited in Sec. 16 
below. Besides that, some advanced codes are used as subroutines in the software packages listed in the same 
section. In some cases, the Euler-Maclaurin formula (2.12) also may be useful for numerical integration. 

4 A powerful (and free :-) interactive online tool for working out indefinite ID integrals in the symbolic form is 
available at http://integrals.wolfram.com/index.isp . 
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(ii) Semi-definite integrals : 

- Integrals with l/(e^±l): 



J-p-=i»& + .-), 

a e g + 1 



I 



« -in- 1 



a>0 ? ' 



1 l-e 



(iii) Definite integrals 

- Integrals with 1/(1 + £ 2 ): 



r a 



dt; 



=i. 



Integrals with(l - % 2n ) m : 



2n 



r 



2n . 



/r 



n + l 
2n 



where T(s) is the gamma-function 5 whose most important property is 

T(n) = (n-\)\, for n = 1, 2,... ; 
other particularly important values of the gamma-function are 

r - 











' 5 1 




= V^, r 




= iV^, r 




v2y 




v2; 


2 


V2j 



2-2 



1- 3-5 

2- 2-2 



7t, 



Integrals with e 



in particular, for integer s = n + 1, Eq. (6.6b) reduces this integral to 

CO 

\% n e~td% = n\. 

o 

- Integrals with 1/(<?^±1): 



(6.4a) 
(6.4b) 



(6.5a) 
(6.5b) 



(6.6) 



(6.6b) 



(6.6c) 



(6.7a) 



(6.7b) 



5 This function is most frequently defined by Eq. (6.7a). 
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j^Al = (l-2 1 - s )r(s)g(s), fors>0, 



o +1 



= T(s)g(s), fors>l, 



(6.8a) 
(6.8b) 



where £(s) is the Riemann zeta- function - see Eq. (2.6). Particular cases: for s = 2n, 

f£ d£ 2 -1 2n 



+1 



2n 



°° g2n-l 



<r"-'d£ (2^-) 



2n 



4n 



■5 



2h ' 



(6.8c) 



(6.8d) 



where B n are the Bernoulli numbers - see Eq. (2.12). For the particular case s = 1 (when Eq. (6.8a) 
yields uncertainty), 



dg- 
+1 



= ln2. 



- Integrals with exp{-£ }: 



v ^ y 



for s > -1 ; 



for applications the most important particular even values of s are 0 and 2: 



r -<?2 1 
o z 



o 



v2 y 



though we will also run into the cases s = 4 and 5 = 6: 

L — = 

o 2 v 2 y ° o ^ v^y 

for odd values s = 2n + 1 (with n = 0, 1,2,...), Eq. (6.9a) takes a simpler form 

2 v 7 2' 



^ = lrf^ = — . tt 6 e-* 2 ds = -r 

i 2 [2 8 J 2 



J^-' e -« 2 ^ = Ir(„ + i)^ 



15V^ 
16 



- Integrals with cos and sin: 



00 CO / 

{cos(^ = {sin(^ = £ 



sl/2 



(6.8e) 



(6.9a) 



(6.9b) 
(6.9c) 



(6.9d) 



(6.9e) 



(6.10) 
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CO 

f cos ^ d i = —e- a 



71 



(6.11) 



(6.12) 



Integrals with In: 



+ dZ = x[a-(a 2 -l) 1,2 l fora>l. 

k l + (l-^)' /2 , g 1 

jln d £ = l. 

o S 



(6.13) 



(6.14) 



Integral representations of the Bessel functions of integer order, and a related important Fourier 



series: 



In _ 

r / \ z f /a cos £ e , e 

J n (a) = — e ^cosngdg; 

71 " 

n 0 

t r \ 1 f « cos E e j e 

/„(«) = — e b cosnq dg . 

77- J 



(6.15a) 
(6.15b) 
(6.15c) 



(i) Definitions : 

- Scalar ("dot-") product: 



7. 3D vector products 



7=1 



(7.1) 



where a,- and bj are vector components in any orthogonal coordinate system. In particular, vector squared 
(the same as norm squared): 



2 2 II l|2 

a = a - a = 2_, a j = || a || • 

7=1 

Vector ("cross-") product: 

axbs n^aj^ -a 3 b 2 ) + n 2 (a 3 b l - a x b 3 ) + n 3 (a l b 2 -a 2 b l ) 



(7.2) 



«1 n 2 n 3 



^1 ^2 ^3 



(7.3) 
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where {n,} is the set of mutually perpendicular unit vectors 6 along the corresponding coordinate system 
directions. 7 In particular, Eq. (7.3) yields 

axa = 0. (7.4) 
(ii) Corollaries (readily verified by Cartesian components): 

- Double vector product (the so-called bac minus cab rule): 

ax(bxc) = b(a c)-c(a b) . (7.5) 

- Mixed scalar- vector product (called the operand rotation rule): 

a-(bxc) = b-(cxa) = c-(axb). (7.6) 

- Scalar product of vector products: 

(a x b) • (c x d) = (a • c)(b • d) - (a • d)(b • c) ; (7.7a) 
in the particular case of two similar operands (say, a = c and b = d), the last formula is reduced to 

(axb) 2 = (ab) 2 -(a-b) 2 . (7.7b) 



8. Differentiation in 3D Cartesian coordinates 

- Definition of the del (or "nabla") vector-operator V: 8 




(8.1) 



where ry is a set of linear and orthogonal (called Cartesian) coordinates along directions n y . In 
accordance with this definition, the operator V acting on a scalar function of coordinates, /(r), 9 gives its 
gradient: 

i.e., a new vector. 

- The scalar product of del by a vector function of coordinates (a vector field), 

f(r) = (r), ( 8 - 3 ) 

compiled formally following Eq. (7.1), is a scalar function - the divergence of the initial function: 



6 Popular alternative notations for this vector set are { e j } and { r, }. 

7 It is easy to use Eq. (7.3) to check that the direction of the product vector corresponds to the "corkscrew rule": if 
we rotate the first operand toward the second one, the usual corkscrew moves in the direction of the product. 

8 One can run into the following notation: V = d/dr, which is convenient is many cases, but may be misleading in 
a few others, so it will be not used in these notes. 

9 In this, and 4 next sections, all scalar and vector functions are assumed to be differentiable. 
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3 df. 

VfsV^sdivf, 



(8.4) 



while the vector product of V and f, formed in a formal accordance with Eq. (7.3), is a new vector - the 
curl™ of f: 



Vxf = 



iij n 2 n 3 

AAA 

9tj dr 2 dr 3 
f\ fi fi 



n, 



v 5r 2 8r 3j 



v 5r 3 5r iy 



+ n- 



curlf. (8.5) 



- One more frequently met 11 "product" is (f-V)g, where f and g are two arbitrary vector functions 
of r. This product should be also understood in the sense implied by Eq. (7.1), i.e. as a vector whose j'-th 
Cartesian component is 



&-v)g],=t/ / ^ 



(8.5) 



9. The Laplace operator 

- Definition in Cartesian coordinates - in the formal accordance with Eq. (7.2): 

v2 = v - v = Z^- (9.1) 

H dr j 

- According to the definition, the Laplace operator acting on a scalar function of coordinates 
gives a new scalar function: 

V 2 / - V • (V/) = div (grad f) = j^. (9.2) 

- On the other hand, acting on a vector function (8.3), operator V returns another vector: 12 

V 2 f = tn ; V 2 / r (9.3) 

7=1 



10. Operators V and V 2 in the most important systems of orthogonal coordinates 13 

(i) Cylindrical 14 coordinates [p, q>, z} (see Fig. below) may be defined by their relations with the 
Cartesian coordinates: 



10 In the European tradition, this operator is called rotor and denoted as rot. 

11 See, e.g., Eqs. (11.5) and (11.6) below. 

12 Note that Eq. (9.3) is only valid in Cartesian (i.e. orthogonal and linear) coordinates, but generally not in other 
orthogonal coordinates - see, e.g., Eqs. (10.6) and (10.12). 

13 Some other orthogonal coordinate systems are discussed in EM Sec. 2.3. 

14 In 2D geometry with fixed coordinate z, these coordinates are called polar. 
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f-j = p cos cp, 
r 2 = p sin cp, 



r 3 =z. 







'\/-- cp- 
Gradient of a scalar function: 



V/=n ^ + n 



% . „ 1 5/ , „ df 



+ n 



p dp ^ p dcp z dz 



- The Laplace operator of a scalar function: 



v 5 Py 



i a 2 / a 2 / 



p 2 dcp 2 dz 2 ' 

- Divergence of a vector function of coordinates (f = x\.pf p + + n/ z ): 

1 d(pf p ) idf. df z 



V f 



■ + ■ 



■ + ■ 



p dp p dcp dz 



Curl of a vector function: 



Vxf =n 



+ n. 



p dcp dz 
The Laplace operator of a vector function 



dz dp 



+ n 



dp d(p 



V 2 f =n 



v 2 A 



1 



2 J P 



13l 

p 2 dcp 



+ n 



V 2 f f + 



p 2 dcp 



+ n,V 2 f 



(ii) Spherical coordinates {r, 0, cp} (see Fig. below) may be defined as: 

7-j = rsin^cos^, 
r 2 = r sin 0 sin cp, 




r 3 = rcosO. 



Gradient of a scalar function of coordinates: 



df 13/ 1 a/ 

W =n r — + n e — — + xy <p — • 

Sr rsm9 dcp 



The Laplace operator of a scalar function: 

i 3 r 2 an i 

7Tr r '- 1 



v 2 /= 2 



arj r 2 sin#a# 



sr. .an 

sin6> — 1 + 



i a 2 / 



a^j (r sin 6>r dcp 1 



(10.1) 



(10.2) 



(10.3) 



(10.4) 



(10.5) 



(10.6) 



(10.7) 



(10.8) 



(10.9) 
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Divergence of a vector function f = n r f r + n$e+n< l f (p : 

1 d(r 2 f r ) t 1 g(/,sinfl) 1 df. 



V f = 



+ + ■ 

dr rsin6> 80 rsin# Sep 



(10.10) 



Curl of a similar vector function: 



Vxf = n 



1 



r rsin^ 



dif^sme) 8f 9 
80 8cp 



1 df r d(rf,) 



sin0 Sep dr 



- The Laplace operator of a vector function: 



V 2 f =n, 



v 2 / r -^/ r - 



r sin# 80 



8 (f ■ m 2 
— (f e sm0)-— — 



+ n 



2/- "\ 



djrfo) df r 
8r 80 



(10.11) 



d z f, 



+ n. 



1 



r sin^ dcp 

| 2 df r 2cos0 df v ) 

r 2 sin 2 0 h + r 2 80 r 2 sin 2 0 8cp 

1 



/ | 2 8f r | 2cos£ 8f e 

r 2 sin 2 0 9 r 2 sin^ 8cp r 2 %m 2 0 8cp 



(10.12) 



11. Products involving vector V 

(i) Useful zeros : 

- For any scalar function /(r) , 

Vx(v/) = curl(grad/) = 0. (11.1) 

- For any vector function f (r) , 

V-(Vxf) = div(curlf) = 0. (11.2) 

(ii) Laplace operator expressed via the curl of a curl: 

V 2 f =V(V-f)-Vx(Vxf). (11.3) 

(iii) Spatial differentiation of a product of a scalar function by a vector function : 

- The scalar 3D generalization of Eq. (4.1) is 

V-(/g)=(V/)-g + /(V-g), (11.4a) 
and its vector generalization is similar: 

Vx(/g) = (V/)xg + /(Vxg). (11.4b) 

(iv) 3D spatial differentiation of products of two vector functions : 

Vx(fxg)=f(V-g)-(f-V)g-(V-f)g + (g-V)f, (11.5) 

V(f-g) = (f-V)g + (g-V)f + fx(Vxg) + gx(Vxf), (11.6) 

V-(fxg)=g-(Vxf)-f -(Vxg). (11.7) 
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12. Integro-differential relations 

(i) For an arbitrary surface S limited by closed contour C : 

- The Stokes theorem, valid for any differentiable vector field f(r): 

J(Vxf)- d 2 r = J(Vxf) n J 2 r = <ff dr = jf T dr, (12.1) 

s s c c 

2 2 

where d r = nd r is the elementary area vector (normal to the surface), and dr is the elementary contour 
length vector (tangential to the contour line). 

(ii) For an arbitrary volume V limited by closed surface S : 

- Divergence (or "Gauss") theorem, valid for any differentiable vector field f(r): 

J(V-f )d'r = §f-d 2 r = jf n d 2 r. (12.2) 

V s s 

- Green's theorem, valid for two differentiable scalar functions fir) and g(r): 

\(fV 2 g-gV 2 f)d 3 r = §{fVg- 8 Vf) n d 2 r. (12.3) 

V s 

- An identity valid for any two scalar functions /and g, and a vector field j with Vj = 0 (all 
differentiable): 

• Vg) + g(i ■ V/)] d 3 r = §fgj n d 2 r . (12.3) 



13. The Kronecker delta and Levi-Civita permutation symbols 

- The Kronecker delta symbol (defined for integer indices): 

\l if f = j, 

8:, = \ (B.I) 

[0, otherwise. 

- The Levi-Civita permutation symbol (most frequently used for 3 integer indices, each taking 
one of values 1, 2, or 3): 

+ 1, if all 3 indices are different and follow in a " correct" order : 1 23 , 23 1 , or 3 1 2, 
£..,.„ = i - 1, if all 3 indices are different and follow in an " incorrect" order, (13.2) 

0, if any two indices coincide. 



14. Dirac's delta function, sign function, and theta function 

- Definition of ID delta-function (for real a < b): 
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b f 17(0), ifa<0<b, 

* [ 0, otherwise, 

where /(£) is any function continuous near x = 0. In particular (if/(x) = 1 near x = 0), the definition 
yields 

h c IX if a < 0 < Z7, 

f £(*)<& = (14.2) 
* [ 0, otherwise. 

- Relation to the theta-function 6(£) and sign function sgn(^) 

= = ^sgn(£) , (14.3a) 

where 

sgn(a + l f0, if ^ < 0, <5 f-1, if£<0, 

^ )s _gJ£2_ = • * • sgn(a ,^ = • * • (14 . 3b) 

2 [1, if£>l, |£| [+1, if ^ > 1 - 

- An important integral: 

+00 

\e is Zds = 2nS(£). (14.4a) 

-oo 

The coefficient in this equation may be readily verified (or recalled :-) by considering it the Fourier- 
integral presentation of f{s) = 1, and applying Eq. (14.1) to the reciprocal Fourier transform 

f(s) = 1 = — f e~ [2xS(Z)]d%. (14.4b) 

L7Z J 

-00 

- 3D generalization of the delta- function of the radius- vector (the 2D generalization is similar): 

\f{r)S{r)d r= . (14.5) 

* [ 0, otherwise; 

it may be presented as a product of ID delta-functions of Cartesian coordinates: 

8{r) = 8{r l )S{r 2 )S{r,). (14.6) 



15. The Cauchy theorem and integral 

Let complex function f,(z) be analytical within a part of the complex plane that is limited by a 
closed contour C and includes point Then 

j7Cz)d* = 0, (15.1) 



c 

f£(*)-^- = 2tf #(*')• (15-2) 
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The first of these relations is usually called the Cauchy integral theorem (or the "Cauchy-Goursat 
theorem"), and the second one - the Cauchy integral (or the "Cauchy integral formula"). 

16. References 

(i) Properties of some special functions are briefly discussed at the relevant points of the lecture notes; 
in the alphabetical order: 

- Airy functions: QM Sec. 2.4; 

- Bessel functions: EM Sec. 2.4; 

- Fresnel integrals: EM Sec. 8.6; 

- Hermite polynomials: QM Sec. 2.6; 

- Laguerre polynomials (both simple and associated): QM Sec. 3.5; 

- Legendre polynomials, associated Legendre functions, and spherical harmonics: EM Sec. 2.4 
and QM Sec. 3.5. 

(ii) For more formulas, and their discussions, I can recommend the following handbooks (in the 
alphabetical order): 15 

- M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Formulas, Dover, 1965 (and 
numerous later printings); 16 

- 1. Gradshteyn and I. Ryzhik, Tables of Integrals, Series, and Products, 5 th ed., Acad. Press, 1980; 

- G. Korn and T. Korn, Mathematical Handbook for Scientists and Engineers, 2 nd ed., Dover, 2000; 

- A. Prudnikov et al., Integrals and Series, vols. 1 and 2, CRC Press, 1986. 

A popular textbook, 

- G. Arfken et al., Mathematical Methods for Physicists, 7 th ed., Acad. Press, 2012, 

may be also used as a formula manual. 

Many formulas are also available from the symbolic calculation parts of commercially available 
software packages listed in Sec. (iv) below. 

(iii) Perhaps the most popular collection of numerical calculation codes are the twin manuals 

- W. Press et al., Numerical Recipes in Fortran 77, 2 nd ed., Cambridge U. Press, 1992; 

- W. Press et al, Numerical Recipes [in C++ - KKL], 3 ed., Cambridge U. Press, 2007. 

My lecture notes include very brief introductions into numerical methods of differential equation 
solution: 

- ordinary differential equations: CM Sec. 3.9, and 

- equations with partial derivatives: CM Sec. 8.5 and EM Sec. 2.8, 



15 On a personal note, perhaps 90% of all formula needs throughout my research career were satisfied by a small, 
wonderfully compiled old book: H. Dwight, Tables of Integrals and Other Mathematical Formulas, 4 th ed., 
MacMillan, 1961, whose used copies, amazingly, are still available on the Web. 

16 An updated version of this collection is now available online at http://dlmf.nist.gov/ . 
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which include references to literature for further reading. 

(iv) The most popular software packages for numerical and symbolic calculations, with function plotting 
capabilities (in the alphabetic order): 

- Maple (official Web site: http://www.maplesoft.com/ ); 

- Mathcad ( http://www.ptc.com/products/mathcad/ ); 

- Mathematica ( http://www.wolfram.com/products/mathematica/index.html) ; 

- MATLAB ( http : //w ww .math works . com/product s/matlab/) . 
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Appendix CA 

Selected Physical Constants 1 



Symbol 


Quantity 


SI value 
and unit 


Gaussian value 
and unit 


Relative r.m.s. 
uncertainty 


c 


speed of light 
in free space 


2.99 792 458xl0 8 
m/s 


2.99 792 458xl0 10 
cra/s 


0 

(defined value) 


G 


gravitation 
constant 


6.67 4xl0" n 
m 3 /kg-s 2 


6.67 4xl0" 8 

3 2 

cm /g-s 


~1.5xl0~ 4 


h 


Plank 
constant 


1.05 457 16xl0" 34 
Js 


1.05 457 16xl0" 27 
erg-s 


~5xl0~ 8 


e 


elementary 
electric charge 


1.60 217 64xl0" 19 
C 


4.80 320 4xl0" 10 
statcoulomb 


~3xl0~ 8 


m e 


electron's rest 
mass 


0.91 093 82xl0" 30 
kg 


0.91 093 82xl0" 27 
g 


~5xl0~ 8 


m p 


proton's rest 
mass 


1.67 262 16xl0" 27 
kg 


1.67 262 16xl0" 24 
g 


~5xl0~ 8 




magnetic 
constant 


4;zxl0" 7 

N/A 2 




0 

(defined value) 


So 


electric 
constant 


8.85 418 781 7xl0" 12 
F/m 




0 

(defined value) 


k B 


Boltzmann 
constant 


1.38 065 5xl0" 23 
J/K 


1.38 065 5xl0" 16 
erg/K 


~5xl0' 6 



Comments: 

1. The fixed value of c was defined by an international convention in 1983, in order to extend the 
official definition of a second (as "the duration of 9,192,631,770 periods of the radiation corresponding 
to the transition between the two hyperfine levels of the ground state of the cesium-133 atom") to that of 
a meter. The values are back-compatible with the legacy definitions of the meter (initially, as the 



1 The listed numerical values of the constants are from the most recent NIST review: P. J. Mohr et ah, Rev. Mod. 
Phys. 80, 633 (2008), besides the newer results for k B - see B. Fellmuth et cd,, Metrologia 48, 382 (201 1), and for 
a - see R. Bouchendria et al, Phys. Rev. Lett. 106, 080801 (201 1). 
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1/40,000,000-th part of the Earth equator's length) and the second (for a long time, as the 1/(24x60x60)- 
th part of the Earth rotation period), within the experimental errors of those measures. 

2. so and juo are not really the fundamental constants; in the SI system of units one of them (say, 
/Jo) is selected arbitrarily, 2 while the other one is defined via relation sqjUo = 1/c 2 . 

3. The Boltzmann constant £: B is also not quite fundamental, because its only role is to comply 
with the independent definition of the kelvin (K), as the temperature unit in which the triple point of 
water is exactly 273.15 K. If temperature is expressed in energy units k^T (as is done, for example, in 
the SM part of this lecture note series), this constant disappears altogether. 

4. The dimensionless/me structure constant a is numerically the same in any system of units: 

\e 2 IAnsJic in SI units) „„„„„^„„ 1 1 

a = \ „ 0 U 7.29 735 257 xlO" 3 ~ 



[e 2 /hc in Gaussian units J 137.0359990 137 



and has been measured with much smaller r.m.s. uncertainty (~5xl0" 10 ) than the component constants 



2 Note that the selected value of //omay be changed (a bit) in a few years - see, e.g., D. Newell, Phys. Today 67, 
No. 7, pp. 35-41 (2014). 
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